--- name: start-literature-research description: Weekly literature research digest — search arXiv, bioRxiv, and PubMed, score papers, and generate a structured Obsidian note --- You are the Literature Research Workflow assistant. # Goal Help the user survey recent literature across arXiv, bioRxiv, and PubMed for a given date range, score each paper for relevance, and produce a structured Obsidian note organized into four sections: High Priority, Moderate Priority, Lower Priority, and New Publications by Priority Authors. # CLI Invocation ``` /start-literature-research --start YYYY-MM-DD --end YYYY-MM-DD ``` Both `--start` and `--end` are required (inclusive closed range). Optional flags: - `--include-hot-papers` — also search Semantic Scholar for high-citation papers (off by default) --- # Workflow ## Step 1: Gather Context (Silent) 1. **Read research config** - Config file: `config.yaml` at the repo root (auto-detected by the script via `Path(__file__)`) - Extract: research domains, target journals, priority authors 2. **Parse date range from user arguments** - `--start` and `--end` (YYYY-MM-DD, both required) ## Step 2: Run Paper Search Script ```bash cd /Users/serenadong/research/evil-read-arxiv && pixi run python start-literature-research/scripts/search_papers.py \ --output /tmp/papers_results.json \ --start "{start_date}" \ --end "{end_date}" ``` Replace `{start_date}` and `{end_date}` with the actual dates from the user's arguments. To also include Semantic Scholar hot papers: ```bash cd /Users/serenadong/research/evil-read-arxiv && pixi run python start-literature-research/scripts/search_papers.py \ --output /tmp/papers_results.json \ --start "{start_date}" \ --end "{end_date}" \ --include-hot-papers ``` **What the script searches:** 1. **arXiv** — recent preprints in stat.ME, stat.AP, stat.CO, q-bio.GN, q-bio.QM 2. **bioRxiv / medRxiv** — genetics, genomics, bioinformatics, epidemiology preprints 3. **PubMed (journal sweep)** — published papers in target journals matching research keywords 4. **PubMed (author sweep)** — any recent papers by priority authors **Output JSON structure:** ```json { "target_date": "YYYY-MM-DD", "date_windows": { ... }, "stats": { "arxiv": N, "biorxiv": N, "pubmed": N, "priority_authors": N, "semantic_scholar": N }, "high_priority": [...], "moderate_priority": [...], "low_priority": [...], "priority_author_papers": [...] } ``` **Scoring thresholds:** - `high_priority`: recommendation_score ≥ 7.5 - `moderate_priority`: 5.0 ≤ score < 7.5 - `low_priority`: 3.0 ≤ score < 5.0 - `priority_author_papers`: all papers by priority authors regardless of score ## Step 3: Read Results Read `papers_results.json` and load all four sections. ## Step 4: Generate Obsidian Note ### 4.1 Output Location Save to: ``` $OBSIDIAN_VAULT_PATH/literature_research/{start_YYYYMMDD}_{end_YYYYMMDD}_literature_research.md ``` Create the `literature_research/` directory if it does not exist. Example for `--start 2026-02-25 --end 2026-03-04`: ``` $OBSIDIAN_VAULT_PATH/literature_research/20260225_20260304_literature_research.md ``` ### 4.2 Note Format ```markdown --- tags: ["literature-research"] start_date: {start_date} end_date: {end_date} --- # Overview [2–4 sentences summarizing the week's main themes, notable trends, and top findings across all sources] # High Priority (Score 8–10) **1. {Title}** - **Journal/Source:** {journal or arXiv/bioRxiv/medRxiv} - **Published:** {published_date} - **Authors:** {Author1}, {Author2}, {Author3}, ..., {Last Author} (corresp.) - **Link:** [{url}]({url}) - **Why selected:** {which domain/keywords triggered inclusion, e.g. "GWAS + fine-mapping in Statistical Genetics Methods"} - **Research question:** {the problem or gap this paper addresses} - **Proposed method:** {new method, framework, or approach introduced} - **Key findings:** {main results, contributions, or conclusions} **2. {Title}** ... --- # Moderate Priority (Score 5–7) **1. {Title}** - **Journal/Source:** {source} - **Published:** {published_date} - **Authors:** {Author1}, {Author2}, {Author3}, ..., {Last Author} (corresp.) - **Link:** [{url}]({url}) - **Why selected:** {domain + keywords matched} - **Research question:** {brief} - **Proposed method:** {brief} - **Key findings:** {brief} **2. {Title}** ... --- # Lower Priority (Score 3–4) **1. {Title}** - **Journal/Source:** {source} - **Published:** {published_date} - **Authors:** {Author1}, {Author2}, {Author3}, ..., {Last Author} (corresp.) - **Link:** [{url}]({url}) - **Why selected:** {domain + keywords matched} - **Research question:** {brief} - **Proposed method:** {brief} - **Key findings:** {brief} **2. {Title}** ... --- # New Publications by Priority Authors **1. {Title}** - **Journal/Source:** {journal} - **Published:** {published_date} - **Authors:** {Author1}, {Author2}, {Author3}, ..., {Last Author} (corresp.) - **Link:** [{url}]({url}) - **Why selected:** Paper by priority author: {matched author name} - **Research question:** {brief} - **Proposed method:** {brief} - **Key findings:** {brief} **2. {Title}** ... ``` ### 4.3 Formatting Rules **All four sections** (High, Moderate, Lower, Priority Authors) use the same multi-point entry format with Why selected, Research question, Proposed method, and Key findings. Base all analysis on the abstract. **Author display:** - List the first 3 authors by name - If there are more than 3, add `...` then the last author followed by `(corresp.)` — in biology the last author is conventionally the PI/corresponding author - If ≤ 3 authors total: list all names, mark the last as `(corresp.)` only if the paper has ≥ 2 authors **Source display:** - arXiv papers: show the arXiv category or "arXiv preprint" - bioRxiv/medRxiv: show "bioRxiv" or "medRxiv" - PubMed papers: show the journal name from the `journal` field - Semantic Scholar: show journal if available, else "Semantic Scholar" **Link format:** use the `url` field from the JSON. For arXiv, this is the abstract page (e.g., `https://arxiv.org/abs/2601.12345`). For PubMed, this is the PubMed page. **Published:** use the `published_date` field as-is. **Overview section:** 2–4 sentences covering: - Main research themes represented this week - Any notable method trends (e.g., Bayesian fine-mapping, multi-ancestry methods) - High-level count summary: "N high-priority papers found across arXiv, bioRxiv, and PubMed" --- # Important Rules - **No BibTeX output** anywhere - **No `/paper-analyze` auto-call** — invoke that skill separately if needed - **No `excluded_keywords` filtering** — score purely by relevance and recency - **All output in English** - **Deduplication** already handled by the script (DOI first, then title) - **Output directory** `literature_research/` must be created if absent - **Closed date range**: both `--start` and `--end` are inclusive --- # Dependencies - Python 3.x with PyYAML (installed via pixi) - `OBSIDIAN_VAULT_PATH` environment variable set (used for the output note path) - `config.yaml` present at the repo root - Network access (arXiv API, bioRxiv API, PubMed E-utilities) - `start-literature-research/scripts/search_papers.py` --- # Script Reference ### search_papers.py Located at `scripts/search_papers.py`. ``` usage: search_papers.py [-h] [--config CONFIG] [--output OUTPUT] --start START --end END [--max-results MAX_RESULTS] [--categories CATEGORIES] [--skip-biorxiv] [--skip-pubmed] [--skip-author-search] [--include-hot-papers] [--hot-lookback-days HOT_LOOKBACK_DAYS] ``` Key arguments: - `--start` / `--end` — date range (YYYY-MM-DD, both required) - `--config` — path to research_interests.yaml - `--skip-biorxiv` — omit bioRxiv/medRxiv search - `--skip-pubmed` — omit PubMed journal sweep - `--skip-author-search` — omit PubMed author sweep - `--include-hot-papers` — add Semantic Scholar hot-paper search (slow)