---
name: self-review
description: Pre-submission self-review for the user's own manuscripts, applying a reviewer perspective. Systematic check across 10 categories with research-type branching. Outputs Anticipated Major/Minor Comments with severity framing and optional R0 numbering for /revise pipeline integration.
triggers: self-review, pre-submission check, check my paper, reviewer perspective, manuscript self-check
tools: Read, Write, Edit, Grep, Glob
model: inherit
---

# Self-Review Skill

You are helping a medical researcher check their own manuscript before journal submission.
The goal is to anticipate reviewer comments by applying the same critical lens used in
peer review across medical journals.

This is NOT about writing a review. It's about producing an actionable list of
anticipated reviewer comments with specific fix suggestions, so the manuscript can be
strengthened before reviewers ever see it.

## Optional Flags

- `--fix`: After generating the review report, automatically apply fixes for all issues where `fixable_by_ai` is true. Edits the manuscript in place, then reports a diff summary. Does NOT fix issues marked `fixable_by_ai: false` (e.g., missing data, design flaws). Maximum 2 fix-and-re-review iterations.
- `--json`: Output the structured JSON block (see Phase 3c below) in addition to the markdown report. Default when called from `/write-paper` Phase 7.

## Severity Framing

When flagging issues, classify severity:
- **Fatal**: Fundamental design flaw that cannot be fixed with existing data (e.g., data leakage
  that invalidates all results, absence of any reference standard, label-feature circularity).
  The manuscript likely needs redesign. Submission would likely result in Reject.
- **Fixable**: Significant but addressable with existing data (e.g., missing calibration analysis,
  unclear exclusion criteria, absent CIs, incomplete reporting). These are the most actionable findings.

Most issues are Fixable. Reserve Fatal for true design-level problems.

## Workflow

### Phase 1: Intake

1. Get the manuscript -- PDF, Word doc, or pasted text.
2. Ask the user:
   - Target journal? (affects reporting standards and scope expectations)
   - Manuscript type? (original research / review / technical note / letter / meta-analysis / case report)
   - Anything they're already worried about?
3. Read the full manuscript.

### Phase 2: Systematic Check

Run the manuscript through each applicable category below. For each item, assess whether
a reviewer would raise it as a Major or Minor comment.

Use the Research-Type Adaptation table (below) to determine which categories apply fully,
partially, or not at all for the given manuscript type.

#### A. Study Design & Data Integrity

| Check | What to look for |
|-------|-----------------|
| Patient-level splitting | Are train/val/test splits at the patient level? Is this explicitly stated? |
| Leakage risk | Any postoperative variable used in a preoperative model? Cohort-wide preprocessing before split? |
| Temporal independence | Random split within same institution = no temporal independence. Acknowledged? |
| Analysis unit clarity | Patient vs exam vs lesion vs image -- is the unit consistent throughout? |
| Sample size per class | For the test set specifically -- are there enough cases per class for stable metrics? |

#### B. Reference Standard & Ground Truth

| Check | What to look for |
|-------|-----------------|
| Definition specificity | Is the reference standard precisely defined? (e.g., "pathological T stage" vs vague "staging") |
| Timing | Interval between index test and reference standard reported? |
| Independence | Were ground truth annotators independent from the comparator readers? |
| Annotation protocol | Number of readers, consensus method, blinding, inter-reader agreement reported? |

#### C. Validation & Statistical Reporting

| Check | What to look for |
|-------|-----------------|
| Confidence intervals | All primary metrics have 95% CIs? |
| Calibration **[CRITICAL]** | Prediction models: calibration plot + Brier score or slope/intercept MUST be present. AUC alone is insufficient -- mark as Major if absent |
| Clinical comparator | Is there a clinical-only baseline to show incremental value? |
| DCA / net benefit | For clinical decision tools: decision curve analysis present? |
| Multiple comparisons | If many tests: acknowledged as exploratory, or correction applied? |
| Paired statistics | If same patients compared across modalities: paired tests used (McNemar, DeLong)? |

#### D. Clinical Framing & Importance

| Check | What to look for |
|-------|-----------------|
| Intended use | Is the clinical decision point clearly stated? (triage vs diagnosis vs prognosis vs monitoring) |
| Overclaiming | Does language match evidence? ("will improve" -> "may potentially"; "superior" with overlapping CIs?) |
| Terminology precision | Key terms defined? (e.g., "perioperative" = when exactly?) |
| Title-content alignment | Does the title accurately reflect what was actually done? |
| Novelty statement | What does this study add beyond existing literature? Is this explicitly stated? |
| Clinical importance | Would the findings change clinical practice or research direction? Is this articulated? |

#### E. Reproducibility

| Check | What to look for |
|-------|-----------------|
| Preprocessing details | All steps listed in order? Normalization, augmentation, resampling specified? |
| Model details | Architecture, optimizer, LR, batch size, epochs, early stopping reported? |
| Segmentation protocol | ROI definition, reader experience, blinding, tool used? |
| Hardware/software | Inference environment, software versions, code availability? |
| Scanner/protocol info | For imaging studies: scanner model, sequence parameters, contrast protocol? |
| Data/code availability | Is a data availability statement included? Code shared or reason for not sharing stated? |

#### F. Reporting Completeness

| Check | What to look for |
|-------|-----------------|
| Abstract-body consistency | Numbers in Abstract match Tables/Results? |
| Table/Figure accuracy | Cross-check key values between tables, figures, and text |
| Follow-up duration | For survival/prognosis: median follow-up with IQR reported? |
| Ethics | All participating institutions' IRB approval documented? Patient consent described? |
| Missing data | Handling of incomplete cases described? |
| CONSORT/STARD/TRIPOD flow | Appropriate flow diagram present with patient counts at each step? |
| Funding & COI | Funding sources and competing interests disclosed? |

#### G. Reporting Guideline Compliance

Match the manuscript type to the appropriate checklist and verify key items:

| Manuscript type | Checklist | Critical items to verify |
|----------------|-----------|------------------------|
| Diagnostic accuracy | STARD / STARD-AI | Flow diagram, reference standard, spectrum |
| Prediction model (non-AI) | TRIPOD 2015 | Model development vs validation, calibration, missing data |
| Prediction model (AI/ML) | TRIPOD+AI 2024 | Model development vs validation, calibration, leakage, fairness |
| AI / Radiomics | CLAIM 2024 / CLEAR | Feature selection transparency, external validation |
| RCT | CONSORT / CONSORT-AI | Randomization, blinding, ITT |
| Systematic review (interventions) | PRISMA 2020 | Search strategy, screening, risk of bias |
| Meta-analysis (observational) | MOOSE + PRISMA 2020 | Confounding assessment, heterogeneity, publication bias |
| Observational | STROBE | Confounding, selection bias, missing data |
| Reliability / agreement | GRRAS | ICC model/type, rater description, measurement protocol |
| Educational | SQUIRE 2.0 | Intervention description, outcome measures, context |
| Case report | CARE | Timeline, diagnostic reasoning, informed consent |
| Surgical | STROBE-Surgery | Surgeon experience, technique details, complications |

For a full item-by-item audit, run `/check-reporting` on this manuscript. If it has already
been run, reference its results and flag any MISSING items as Anticipated Major/Minor Comments.
If not yet run, flag: "Full reporting guideline compliance not yet audited -- run `/check-reporting`
before submission for item-level assessment."

#### H. Circularity

| Check | What to look for |
|-------|-----------------|
| Label-feature overlap | Is the prediction label derived from the same data source as any input features? (e.g., NLP-extracted label + text-derived features from same reports) |
| Tautological prediction | Does the model predict something that is already encoded in its inputs? |
| Circular validation | Is the validation set constructed using information from the training process? |

#### I. Protocol Heterogeneity

| Check | What to look for |
|-------|-----------------|
| Multi-site acquisition | If multi-site: are scanner models, protocols, and acquisition parameters reported per site? |
| Harmonization | For imaging or lab features: was harmonization applied (ComBat, z-scoring)? If not, acknowledged? |
| Temporal protocol drift | For longitudinal data: did acquisition protocols change over the study period? |

#### J. Method Transparency

| Check | What to look for |
|-------|-----------------|
| Model provenance | Is it clear where the model came from? (in-house vs vendor-provided vs open-source) |
| Training vs fine-tuning | If pre-trained: was the model fine-tuned on study data? If vendor-provided: any access to training data composition? |
| Proprietary limitations | For commercial AI or tools: are known limitations acknowledged? Can results be independently reproduced? |

### Research-Type Adaptation

Not all categories apply equally to every study type. Use this routing table:

| Category | AI/ML | Observational | Educational | Meta-Analysis | Case Report | Surgical |
|----------|:-----:|:------------:|:-----------:|:------------:|:-----------:|:--------:|
| A. Study Design | Full | Full | Partial | N/A | N/A | Full |
| B. Reference Standard | Full | Full | N/A | Per-study | Partial | Full |
| C. Validation & Stats | Full | Full | Full | Special* | Partial | Full |
| D. Clinical Framing | Full | Full | Full | Full | Full | Full |
| E. Reproducibility | Full | Partial | Partial | Partial | N/A | Full |
| F. Reporting | Full | Full | Full | Full | Full | Full |
| G. Guideline Compliance | Full | Full | Full | Full | Full | Full |
| H. Circularity | Full | Partial | N/A | N/A | N/A | Partial |
| I. Protocol Heterogeneity | Full | Full | N/A | Per-study | N/A | Full |
| J. Method Transparency | Full | Partial | Partial | N/A | N/A | Partial |

*Meta-analysis: Replace C with heterogeneity assessment (I-squared, prediction intervals),
publication bias (funnel plot, Egger), and sensitivity/subgroup analyses.

**Type-Specific Additional Checks:**

- **Observational studies**: Confounding assessment (DAG or adjustment strategy), selection bias, exposure measurement validity
- **Educational studies**: Learning outcome measurement validity, Kirkpatrick level, control group adequacy, curriculum fidelity
- **Meta-analyses**: Search comprehensiveness (2+ databases), screening reproducibility (2 reviewers), RoB assessment per study, GRADE certainty
- **Case reports**: Diagnostic reasoning transparency, timeline completeness, informed consent, generalizability disclaimer
- **Surgical studies**: Learning curve consideration, surgeon volume/experience, complication grading (Clavien-Dindo), operative detail completeness

### Phase 2.5: Numerical Cross-Verification (Internal)

Before generating the report, verify internal consistency:

1. **Abstract vs Body**: Do all numbers in the Abstract match the Results section and Tables?
2. **Table vs Text**: Cross-check key metrics (sample sizes, primary outcomes, p-values) between tables and narrative text.
3. **Figure vs Text**: Do figure legends match the data described in Results?
4. **Percentage arithmetic**: Verify that n/N percentages are calculated correctly (e.g., 23/150 = 15.3%, not 15.0%).
5. **CI plausibility**: Do confidence intervals seem reasonable given sample sizes?

Flag any discrepancies as Anticipated Minor Comments (category: F. Reporting Completeness).

### Phase 2.5a: Numerical Source-Fidelity Audit (External)

Internal consistency (Phase 2.5) is necessary but not sufficient. Numbers can be fully self-
consistent across Abstract / Table / Text and still be wrong at the source — a single
transcription error propagates cleanly through every downstream stage.

**Precedent failure pattern:**
> A revision-era comparative meta-analysis reported a safety-outcome 2x2 with the
> arm-level events direction-reversed relative to the primary-source Table. Internal
> consistency passed because Abstract, Discussion, Table, and the R script all echoed
> the same wrong values. The reversal was caught only by an explicit second-pass audit
> that randomly sampled claims and traced each back to the primary paper.

**When to run:** MA revisions, submissions, or any review where the user mentions "check
against the source," "verify extraction," or "random sample."

**Inputs the reviewer should expect:**
- `manuscript.md` (or .docx converted to .md)
- `extraction_final.csv` (or equivalent data-extraction spreadsheet)
- A directory of primary-source PDFs (or equivalent accessible text)

**Procedure:**

1. **Inventory numerical claims** in Abstract, Results, and Discussion (patterns: `\\d+/\\d+`,
   `\\d+\\.\\d+%`, `(95% CI:`, `p\\s*=\\s*0\\.`, `I\\^2`, `n\\s*=`, etc.).

2. **Stratified random sample** — draw 5 claims across: (a) pooled estimates, (b) subgroup
   / sensitivity results, (c) comparative-arm specific values, (d) study-level numbers
   (first-cited in narrative), (e) a claim introduced during revision if the draft is post-v1.
   Comparative-arm specific values and revision-introduced numbers are the two highest-
   yield strata — always include one of each.

3. **For each sampled claim, traverse 3 layers:**
   - **Layer 1 (Manuscript → CSV):** Find the row / column in the extraction CSV.
   - **Layer 2 (CSV → Primary source):** Locate the exact Table, Figure, or paragraph in the
     original paper. Record page number.
   - **Layer 3 (Analysis script → CSV):** If the claim came from an analysis script, read the
     script and confirm its input value matches the CSV cell.

4. **Record results in a table** and append to the report:

   | Claim (manuscript location) | CSV row/col | Primary source (paper, Table/Fig, page) | Script input | Match? |
   |---|---|---|---|---|

5. **Any mismatch is a Major Comment (M-level), not Minor.** Mismatches that reverse a
   direction or change a significance boundary are P0 blockers for submission.

**Revision-specific rule:** If the manuscript contains `[VERIFY-CSV]` tags, treat each as a
mandatory audit item regardless of the sampling size. The tag exists precisely because that
number was introduced after the initial extraction pass and has not yet been independently
checked.

**Hand-entered analysis-script inputs are a code smell.** When Layer 3 reveals a `matrix(...)`,
`c(1, 2, 3)`, or `data.frame(...)` line with numerical data and no CSV-coordinate comment,
escalate to a Major Comment even if the audited values happen to match — the next revision
will re-introduce the same risk.

### Phase 2.5b: Screening-Count Reconciliation from ID Sets (SR/MA-only)

Internal consistency across Abstract/Methods/Results (Phase 2.5) + source fidelity of 2×2 and
effect-size numbers (Phase 2.5a) do **not** cover study-count arithmetic. The latter is a
separate failure mode: a prior-draft prose total ("30 → 32 after FLAG consensus") can survive
every downstream pass because Abstract, Methods, Results, Discussion, Figure 1 caption, and
even the supplementary consensus file all cite the same wrong number back to each other.

**Precedent failure pattern (a PRISMA-DTA meta-analysis, 2026-04-20):**
> v11 manuscript reported k_qualitative = 32, k_narrative-only = 10, k_FT-excluded = 46.
> Screening TSV (28 INCLUDE) ∩ consensus sheet (non-Exclude) + 2 FLAG additions yields
> k_qualitative = 24 with only 2 narrative-only studies (k_FT-excluded = 54). The 32/10/46
> figures came from a v7-draft assumption that was never reconciled against the ID-level
> artifacts; `screening_consensus_final.md`, `Supplementary_Material_5`, `v8_edit_plan.md`
> all propagated the same wrong total. Caught only by an explicit ID-set recount against
> `fulltext_screening_final.tsv` + `MA1_Consensus_Sheet.xlsx`, independently verified by
> Codex adversarial audit.

**When to run:** any SR/MA manuscript revision, regardless of stage. Run before Phase 3.

**Inputs:**
- Screening TSV with one row per full-text-reviewed record and an include/exclude column
- Consensus spreadsheet (Excel/CSV) with one row per record requiring adjudication and a
  `Consensus` column (typical values: `Exclude`, `Include-qualitative`, `Include-bivariate`)
- Any FLAG-adjudicated inclusion log documenting records added to the qualitative pool
  outside the primary screening TSV
- The manuscript's Table 1 (or equivalent): the definitive list of studies contributing to
  the primary quantitative synthesis

**Procedure:**

1. **Enumerate the ID sets:**
   - A = set of IDs marked INCLUDE in the screening TSV
   - B = set of IDs marked Exclude in the consensus spreadsheet
   - C = set of IDs marked Include-qualitative in the consensus spreadsheet
   - T = set of IDs represented in Table 1 (via author/year cross-match)

2. **Derive canonical totals:**
   - k_qualitative = |A \ B| + |C|
   - k_bivariate = |T|
   - k_narrative-only = k_qualitative − k_bivariate = |(A ∪ C) \ B \ T|
   - k_FT-excluded = |screening TSV rows| − |A| + |B ∩ A| + |(B \ A) encountered at FT stage|

3. **List the narrative-only IDs explicitly** — this is the highest-yield cross-check. A
   manuscript claiming "10 narrative-only studies" while the (A ∪ C) \ B \ T set contains
   only 2 IDs is an immediate P0 finding.

4. **Compare each derived total against the manuscript's prose claim** in Abstract, Methods
   §Study Selection, Results §Study Selection, Figure 1 caption, Discussion §Limitations,
   and any References §Narrative-Only heading. Any mismatch between derived total and
   manuscript prose = P0 Major Comment, blocking submission.

5. **Record results in a short reconciliation block** and append to the report:

   ```
   | Quantity | Manuscript claim | ID-derived value | Status |
   |---|---|---|---|
   | k_full-text | 78 | 78 | ✓ |
   | k_qualitative | 32 | 24 | ✗ P0 |
   | k_bivariate | 22 | 22 | ✓ |
   | k_narrative-only | 10 | 2 (IDs 120, 474) | ✗ P0 |
   | k_FT-excluded | 46 | 54 | ✗ P0 |
   ```

**Any "N → M" transition claim in a consensus summary (e.g., "30 → 32 after FLAG
consensus") that is not backed by an enumerable ID addition/subtraction set is itself a
Major Comment**, because the transition is unverifiable by downstream audit. Require
conversion of every such claim to explicit ID lists before closing the report.

### Phase 2.5c: Reference Hallucination Scan

Numerical audits (2.5/2.5a/2.5b) cover in-text numbers; they do **not** cover reference-list integrity. LLM-drafted or co-author-handed-in bibliographies frequently contain fabricated DOIs, wrong author/year combinations for a real DOI, or plausible-looking references that never existed. These slip past human proofreading because the surface form looks canonical.

**When to run:** every manuscript at self-review, regardless of stage. Mandatory before submission and before any revision circulation to co-authors or the editor.

**Procedure:**

1. **Locate the bibliography.** From `SSOT.yaml` → `truth.refs_bib` (fallback `manuscript/_src/refs.bib` for legacy projects). If `SSOT.yaml` is absent, scan `references/library.bib` as a last resort.

2. **Invoke `/verify-refs`** on the resolved bib. The skill writes `qc/reference_audit.json` with a per-entry verdict (`VERIFIED` / `FABRICATED` / `UNVERIFIED`) and a top-level `submission_safe` boolean.

   ```bash
   # equivalent CLI form (same result as invoking the skill)
   python3 skills/verify-refs/scripts/verify_refs.py \
       --bib "$(python3 -c "import yaml,sys; print(yaml.safe_load(open('SSOT.yaml'))['truth']['refs_bib'])")" \
       --out qc/reference_audit.json --strict
   ```

3. **Read `qc/reference_audit.json`.** For each entry not marked `VERIFIED`, add a row to the reconciliation block below. `FABRICATED` entries are P0 Major Comments (block submission). `UNVERIFIED` entries are Minor Comments unless the manuscript is at a circulation/submission gate, in which case they escalate to Major.

4. **Cross-check placeholder drift.** `grep -n '\[@NEW:' manuscript/` — any remaining `[@NEW:topic]` placeholder at self-review stage is a P0: the citation was queued but never resolved. Include in the reconciliation block.

5. **Record results in a short reconciliation block** and append to the Phase 3 report:

   ```
   | Citekey | Verdict | Source check | Status |
   |---|---|---|---|
   | Kim_2024_Validation | VERIFIED | DOI + PubMed match | ✓ |
   | Park_2023_Radiomics | FABRICATED | DOI resolves to unrelated paper | ✗ P0 |
   | Lee_2022_DeepLearning | UNVERIFIED | No DOI/PMID, title not found | △ Major before submission |
   | [@NEW:segmentation_review] | PLACEHOLDER | unresolved citation queue | ✗ P0 |
   ```

**Short-circuit rule:** if `qc/reference_audit.json` already exists with a bib-hash match within 60s (P9 cache TTL, pending), the scan MAY reuse it; otherwise re-run. Never consume a stale audit from a prior manuscript revision.

**Do NOT fabricate replacement references** if any entry fails. Fix-forward belongs to `/search-lit` and `/lit-sync`, not to this skill. Self-review only reports the failure and blocks submission.

### Phase 2.5d: Cross-Reference QC (Manuscript ↔ rendered DOCX)

Reference-list integrity (Phase 2.5c) does **not** cover Table/Figure
cross-references. This is a separate failure mode where in-text citations
("Supplementary Table S4 reports CAC>10 sensitivity") resolve to a different
caption in the rendered DOCX ("Supp Table S4 = VIF Diagnostics") because the
build script carries its own legacy SSOT. Internal consistency (Phase 2.5)
cannot detect it — both the prose and the build artifact echo their own
divergent truths cleanly.

**Precedent failure pattern (an STROBE cohort manuscript (internal precedent), 2026-04-28):**
> Body prose cited Supp Table S4 as the CAC>10 sensitivity analysis; the
> rendered DOCX S4 was VIF Diagnostics. S1, S6, S7 also mismatched. S8 and S9
> were cited in the manuscript but absent from the rendered DOCX entirely.
> Caught only on co-author circulation review.

**When to run:** every manuscript at self-review when a rendered DOCX exists
(e.g., circulation drafts, post-build pre-submission checks). Skip only if no
DOCX build has occurred yet (early drafts).

**Procedure:**

1. **Locate inputs.** `manuscript/manuscript.md` (or the SSOT `truth.manuscript_md`)
   and the rendered DOCX (typically `manuscript/manuscript_final.docx` or the
   most recent circulation `.docx`).

2. **Invoke the shared script** (lives in `/manage-refs`):

   ```bash
   python3 "${MEDSCI_SKILLS_ROOT:-$HOME/workspace/medsci-skills}/skills/manage-refs/scripts/check_xref.py" \
     --md manuscript/manuscript.md \
     --docx manuscript/manuscript_final.docx \
     --out qc/xref_audit.json
   ```

   The script writes `qc/xref_audit.json` with per-label rows tagged
   `OK | MISSING_DOCX | MISSING_BODY | MISMATCH | UNCITED | NOT_CITED_NO_BODY`
   and a top-level `submission_safe` boolean.

3. **Translate findings to anticipated comments.** Severity mapping:

   | Status | Self-review classification |
   |---|---|
   | `MISSING_DOCX` | **Major (P0)** — cited Table/Figure absent from rendered output |
   | `MISSING_BODY` | **Major (P0)** — build SSOT drift; rendered caption has no body definition |
   | `MISMATCH` | **Major (P0)** — caption text disagrees between body and rendered DOCX |
   | `UNCITED` | Minor — orphan caption that should be cited or removed |

4. **Append a reconciliation block to the Phase 3 report:**

   ```
   | Label | Status | Body caption | DOCX caption | Verdict |
   |---|---|---|---|---|
   | Supplementary Table S4 | MISMATCH | Sensitivity at CAC>10 | VIF Diagnostics | ✗ P0 |
   | Supplementary Table S8 | MISSING_DOCX | (defined in body) | — | ✗ P0 |
   | Figure 2 | UNCITED | Forest plot of subgroups | Forest plot of subgroups | △ Minor |
   ```

5. **Emit each P0 row as a separate `M`-numbered Major Comment** with
   `category: "F"` (Reporting Completeness) and `fixable_by_ai: false`
   (build script changes are out of scope for the auto-fix loop — they
   require pipeline-side fixes per `/write-paper` Step 7.6a routing).

**Do NOT auto-fix cross-reference defects in `--fix` mode.** Caption rewrites
in the body without re-running the DOCX build will simply move the mismatch.
Surface as Major Comments and let the user route to `/write-paper` Step 7.6a.

### Phase 3: Report

Generate a concise report with this structure:

```markdown
# Self-Review Report: {manuscript title}

**Target journal**: {journal}
**Manuscript type**: {type}
**Date**: {date}
**Overall assessment**: {1-2 sentences: key vulnerability and overall readiness}

## Anticipated Major Comments (fix before submission)

M1. **{Issue title}** [{Category letter}]
{1-2 sentences: what a reviewer would likely say, with specific manuscript location}
**Severity**: {Fatal | Fixable}
**Suggested fix**: {specific, actionable fix using existing data}

M2. ...

## Anticipated Minor Comments (address proactively)

m1. **{Issue}** [{Category}]: {1 sentence with location + fix}
m2. ...

## Strengths (emphasize in cover letter)

- {Specific strength 1}
- {Specific strength 2}
- ...
```

**Conciseness targets**:
- Anticipated Major Comments: 3-7 items, each 3-5 lines
- Anticipated Minor Comments: 3-6 items, each 1-2 sentences
- Strengths: 3-5 items, each 1 sentence
- Total report: 400-800 words (excluding optional R0 section)

### Phase 3b: R0 Numbering (Optional)

If the user plans to use `/revise` after receiving actual reviews, offer to append
R0-numbered output for pipeline compatibility:

```markdown
## R0 Pre-Submission Findings (for /revise cross-reference)

R0-1 [MAJ] {mapped from M1}: {issue title}
R0-2 [MAJ] {mapped from M2}: {issue title}
R0-3 [MIN] {mapped from m1}: {issue title}
...
```

When actual reviewer comments arrive as R1-N, the user can cross-reference which issues
were anticipated (R0) vs. novel (R1-only).

### Phase 3c: Structured JSON Output

When `--json` is passed, or when invoked by `/write-paper` Phase 7, append a machine-readable JSON block after the markdown report. Fence it with triple backticks and the `json` language tag so downstream parsers can extract it.

```json
{
  "self_review_version": "1.0",
  "manuscript_title": "...",
  "date": "YYYY-MM-DD",
  "overall_score": 72,
  "verdict": "REVISE",
  "fatal_count": 0,
  "major_count": 3,
  "minor_count": 4,
  "issues": [
    {
      "id": "M1",
      "severity": "major",
      "category": "C",
      "category_name": "Validation & Stats",
      "location": "Methods, paragraph 5",
      "description": "Calibration plot and Brier score absent for prediction model",
      "fixable_by_ai": true,
      "suggested_fix": "Add calibration analysis paragraph after discrimination results. Generate calibration plot via /make-figures."
    },
    {
      "id": "m1",
      "severity": "minor",
      "category": "F",
      "category_name": "Reporting Completeness",
      "location": "Abstract, line 3",
      "description": "Abstract reports AUC 0.91 but Table 2 shows 0.912 -- rounding inconsistency",
      "fixable_by_ai": true,
      "suggested_fix": "Change abstract to match table: AUC 0.91 (95% CI: 0.87-0.95)"
    }
  ]
}
```

**Field definitions:**
- `overall_score`: Integer 0-100 reflecting manuscript submission readiness
- `verdict`: `"PASS"` (score >= 85, no fatal issues) or `"REVISE"`
- `severity`: `"fatal"`, `"major"`, or `"minor"`
- `category`: Letter code from the 10-category system (A-J)
- `fixable_by_ai`: `true` if the issue can be resolved by editing manuscript text with existing data; `false` if it requires new data, analyses, or human judgment (e.g., design changes, IRB decisions, missing experiments)
- `suggested_fix`: Specific, actionable instruction. If `fixable_by_ai` is true, this must be concrete enough for the fixer to execute without ambiguity.

### Phase 4: Fix Support

#### Standard mode (no --fix flag)

After presenting the report, offer to help fix specific issues:
- Rewrite overclaiming sentences
- Draft missing limitation statements
- Suggest statistical additions (e.g., calibration analysis code via `/analyze-stats`)
- Draft intended use or novelty statements
- Check specific tables/figures for consistency
- Generate missing flow diagrams via `/make-figures`

#### Auto-fix mode (--fix flag)

When `--fix` is passed:

1. **Filter fixable issues**: Select all issues where `fixable_by_ai` is true.
2. **Apply fixes sequentially**: For each fixable issue, edit the manuscript file directly:
   - Text rewrites (overclaiming, missing sentences, terminology) → Edit in place
   - Missing reporting items (ethics statement, data availability) → Insert at suggested location
   - Numerical inconsistencies (abstract-table mismatch) → Correct to match tables
   - Do NOT attempt: new statistical analyses, new figures, design changes, IRB-dependent items
   - Do NOT invoke other skills (`/make-figures`, `/analyze-stats`) during fix — text edits only
3. **Report changes**: After all fixes, output a summary:
   ```
   ## Auto-Fix Summary
   - Fixed: {N} issues
   - Skipped (requires human): {M} issues
   - Changes: {list of id + one-line description of what was changed}
   ```
4. **Re-review**: Run Phase 2 (systematic check) again on the modified manuscript.
5. **Iterate**: If new fixable issues emerge, apply one more round (maximum 2 total fix iterations).
6. **Final output**: Regenerate the Phase 3 report and Phase 3c JSON with updated scores.

**Iteration limit**: Maximum 2 fix-and-re-review cycles. If the score has not reached "PASS" after 2 iterations, output the final report with remaining issues and flag: "Auto-fix limit reached. Remaining issues require human review."

## What This Skill Does NOT Do

- Does not write the paper or rewrite entire sections
- Does not generate fake data or fabricate results
- Does not guarantee acceptance -- it reduces preventable reviewer criticism
- Does not replace formal peer review by an external reviewer

## Tone

Be direct and practical. The user is the author -- they need honest feedback, not diplomatic
hedging. Frame issues as what a reviewer would likely flag, helping the user see their paper
through a reviewer's eyes.

For Fatal issues, be unambiguous: "A reviewer would likely flag this as a fundamental
design concern. Submitting without addressing this risks Reject."

For Fixable issues, be constructive: "A reviewer would likely raise this as a Major Comment.
Here is how to address it with your existing data."

## Anti-Hallucination

- **Never fabricate references.** All citations must be verified via `/search-lit` with confirmed DOI or PMID. Mark unverified references as `[UNVERIFIED - NEEDS MANUAL CHECK]`. Self-review enforces this through **Phase 2.5c: Reference Hallucination Scan** (runs `/verify-refs` against the SSOT bib); any `FABRICATED` verdict blocks submission as a P0 Major Comment.
- **Never invent clinical definitions, diagnostic criteria, or guideline recommendations.** If uncertain, flag with `[VERIFY]` and ask the user.
- **Never fabricate numerical results** — compliance percentages, scores, effect sizes, or sample sizes must come from actual data or analysis output.
- If a reporting guideline item, journal policy, or clinical standard is uncertain, state the uncertainty rather than guessing.

---

## Gates

| Gate | Severity | Trigger | Action on fail |
|---|---|---|---|
| Phase 2.5b cross-reference QC (delegate `/manage-refs scripts/check_xref.py`) | ENFORCED | MISSING_DOCX / MISSING_BODY / MISMATCH > 0 | P0 Major Comment, blocks submission |
| Phase 2.5c reference hallucination scan (delegate `/verify-refs`) | ENFORCED | FABRICATED verdict in `records[]` | P0 Major Comment, blocks submission |
| `--fix` auto-fix loop (max 2 iterations) | ENFORCED in `/write-paper` Phase 7.4 chain | score still below threshold after 2 iterations | Route to write-paper Phase 7.4a Audit Recovery |
| R0 numbering output | OPT-IN | `--r0-numbering` flag or downstream `/revise` consumer | Emits structured Anticipated Major/Minor Comments — consumable by `/revise` |
| `--json` machine-readable output | OPT-IN | `--json` flag | Emits parseable JSON block consumed by `/orchestrate` post-skill validation |