--- name: write-literature-review description: Iterative literature-review workflow for a research topic. Use this skill to draft up to 10 search keywords, build a seed set from OpenAlex title-and-abstract matches, expand by backward and forward citations, screen candidates by title and abstract, repeat until no new relevant papers remain, rank the final set, inspect up to 30 full texts, build a concept-reference knowledge graph, and write a literature review grounded only in the final included references. --- # Write Literature Review ## Overview Use this skill when the user wants a serious literature review rather than a quick paper list. The workflow is intentionally iterative: search broadly, expand by citations, screen with judgment, repeat until saturation, then read deeply and synthesize. Keep a separate execution log throughout the whole workflow at `logs/workflow_log.md`. Update it step by step as work happens. The log should record what you did, what inputs you used, what outputs you produced, what you learned, and what decision or next action followed. ## Workflow 1. Clarify the review scope and record it in `plan/review_plan.md`: topic, research questions, target field or venue, time window, inclusion criteria, exclusion criteria, desired review type, and output directory. Log the scope assumptions and planning decisions in `logs/workflow_log.md`. 2. Draft no more than 10 search keywords from the user's topic. Prefer short, high-recall phrases rather than full sentence queries. Save them in `search/seed_keywords.json` through the search command, and log the keyword set plus why those terms were chosen. 3. Build the seed set by searching OpenAlex for papers whose title and abstract match at least one keyword: ```bash python /path/to/write-literature-review/scripts/lit_review_pipeline.py search \ --out review_workspace \ --keyword "KEYWORD 1" \ --keyword "KEYWORD 2" \ --email you@example.com \ --openalex-api-key "$OPENALEX_API_KEY" ``` 4. Screen the seed set by title and abstract only. Keep a filtered file named `screening/filtered_candidates.jsonl`. Exclude off-topic records and record concise reasons when practical. Log how many papers were screened, how many were kept, common exclusion reasons, and any borderline decisions. 5. Expand the filtered frontier by backward and forward citations: ```bash python /path/to/write-literature-review/scripts/lit_review_pipeline.py expand \ --workspace review_workspace \ --input screening/filtered_candidates.jsonl \ --email you@example.com \ --openalex-api-key "$OPENALEX_API_KEY" ``` This expansion uses: - backward citations: all works referenced by the filtered frontier - forward citations: all works that cite the filtered frontier 6. Deduplicate by script and screen the new queue by title and abstract only. The script maintains `screening/visited_titles.jsonl` so already-seen titles are not re-screened. Repeat Step 5 and this screening step until `screening/screening_queue.jsonl` becomes empty or yields no newly relevant papers. For every iteration, append a short round summary to `logs/workflow_log.md`. 7. Rank the final filtered set by transparent metadata signals, emphasizing citation count while also considering source or publisher presence and author count: ```bash python /path/to/write-literature-review/scripts/lit_review_pipeline.py rank \ --workspace review_workspace \ --input screening/filtered_candidates.jsonl ``` Treat the rank as a heuristic, not as proof of scientific quality. 8. Deepen into the ranked set. Try to retrieve and inspect no more than 30 full texts from the top-ranked papers: ```bash python /path/to/write-literature-review/scripts/lit_review_pipeline.py fulltext \ --workspace review_workspace \ --input references/ranked_references.jsonl \ --core-api-key "$CORE_API_KEY" \ --limit 30 ``` Read and summarize only the concepts most relevant to the user's topic. Do not pretend to have read unavailable papers. Log which papers were actually accessed, which were unavailable, and the main concepts extracted from each read. 9. Build the knowledge graph in two layers: - first write a text version that connects concepts, claims, methods, gaps, and references - then build the structured graph artifact: ```bash python /path/to/write-literature-review/scripts/build_knowledge_graph.py \ --workspace review_workspace ``` 10. Write the final literature review markdown in `review/literature_review.md`, then render the root deliverable PDF: ```bash python /path/to/write-literature-review/scripts/lit_review_pipeline.py render-review \ --workspace review_workspace ``` 11. Finalize `logs/workflow_log.md` with a concise end summary: total search rounds, total screened papers, final included set size, full texts actually read, major themes found, and known blind spots or limitations. ## Iteration Rules - Use `screening/screening_queue.jsonl` as the next batch to screen. - Use `screening/filtered_candidates.jsonl` as the current relevant frontier and cumulative included set for expansion. - Keep `screening/visited_titles.jsonl` as the visited-title boundary across all rounds. - Do not cite or rely on papers outside the final filtered set. - Stop expansion when a new round produces no newly relevant papers. ## Workspace Layout Keep only three presentation files at the workspace root: - `literature_review.pdf` - `references.md` - `knowledge_graph.png` Put all editable or machine-readable files in shallow named folders: - `plan/`: scope and planning notes - `logs/`: step-by-step execution log and audit trail - `search/`: keyword drafts, seed set, merged search pool, deduped search pool - `screening/`: screening queue, filtered set, visited titles, screening log - `expansion/`: raw backward and forward expansion outputs - `fulltext/`: full-text discovery outputs and reading notes - `review/`: editable markdown review source - `graph/`: graph markdown, graph JSON, and graph DOT source - `references/`: ranked reference artifacts and optional machine-readable reference exports ## Deliverables Produce these files unless the user asks for a different format: - `plan/review_plan.md`: scope, criteria, assumptions, and search boundaries. - `logs/workflow_log.md`: step-by-step record of actions taken, outputs produced, and decisions made. - `search/seed_keywords.json`: up to 10 drafted keywords. - `search/seed_candidates.jsonl`: initial title-and-abstract seed set. - `screening/screening_queue.jsonl`: newly expanded records awaiting LLM screening. - `screening/visited_titles.jsonl`: all already visited article titles, including seeds and candidates. - `screening/filtered_candidates.jsonl`: final title-and-abstract relevant set after iterative screening. - `references/ranked_references.jsonl`: ranked final filtered list. - `references/ranked_references.md`: readable ranking table. - `fulltext/fulltext_hits.json`: discovery results for up to 30 deeper reads. - `review/literature_review.md`: editable review source. - `graph/knowledge_graph.md`: text version of the concept-reference graph. - `graph/knowledge_graph.json`: structured graph nodes and edges. - `graph/knowledge_graph.dot`: Graphviz source used to render the root PNG. - `literature_review.pdf`: final review article at root. - `references.md`: final reference list at root. - `knowledge_graph.png`: final graph image at root. ## Subtasks Use code-based subtasks for repeatable operations: - Seed search by keyword: `scripts/lit_review_pipeline.py search` - Backward and forward expansion: `scripts/lit_review_pipeline.py expand` - Deduplication: `scripts/lit_review_pipeline.py dedupe` - Reference ranking: `scripts/lit_review_pipeline.py rank` - Full-text discovery: `scripts/lit_review_pipeline.py fulltext` - Knowledge graph construction: `scripts/build_knowledge_graph.py` Use prompt-based judgment for interpretation-heavy operations: - Drafting the keyword list from the user topic. - Screening title-and-abstract candidates for relevance. - Deciding when saturation has been reached. - Interpreting methods, concepts, findings, and limitations. - Writing the final review. ## Screening Rules - During iterative screening, judge relevance from title and abstract only. - Prefer recall in early rounds and precision in later rounds. - Keep borderline-but-possibly-useful papers unless the mismatch is clear. - Track inclusion and exclusion decisions clearly enough that another reviewer could audit them. - Never confuse "full text not found" with "not relevant." ## References Read these files before running a substantial review: - `references/workflow.md`: detailed iterative search, screening, and ranking rules. - `references/workflow_logging.md`: format and expectations for the step-by-step workflow log. - `references/review_writing.md`: template and instructions for the final literature review article.