--- name: Answering Research Questions description: Main orchestration workflow for systematic literature research - search, evaluate, traverse, synthesize when_to_use: When user asks a research question. When user wants to find specific data in literature. When starting comprehensive literature review. When user says "find papers about" or "what is known about". version: 1.0.0 --- # Answering Research Questions ## Overview Orchestrate the complete research workflow from query to findings. **Core principle:** Systematic, trackable, comprehensive. Search β†’ Evaluate β†’ Traverse β†’ Synthesize. **Announce at start:** "I'm using the Answering Research Questions skill to find [specific data] about [topic]." ## The Process ### Phase 1: Parse Query Extract from user's request: **Keywords:** - Main concepts (e.g., "BTK inhibitor", "selectivity") - Synonyms and alternatives (e.g., "Bruton tyrosine kinase") - Related terms (e.g., "off-target", "kinase panel") **Data types needed:** - Specific measurements (IC50, KD, EC50, etc.) - Methods or protocols - Structures or sequences - Results or conclusions **Constraints:** - Date ranges - Specific compounds/targets - Organisms or systems - Publication types **Ask clarifying questions if needed:** - "Are you looking for in vitro or in vivo data?" - "Any specific time frame?" - "Which kinases are you most interested in?" - **"What email address should I use for Unpaywall API requests?"** (Required for finding open access papers) ### Phase 2: Initialize Research Session **Propose folder name:** ``` research-sessions/YYYY-MM-DD-brief-description/ ``` Example: `research-sessions/2025-10-11-btk-inhibitor-selectivity/` **Show proposal to user:** ``` πŸ“ Creating research folder: research-sessions/2025-10-11-btk-inhibitor-selectivity/ Proceed? (y/n) ``` **Create folder structure:** ```bash mkdir -p "research-sessions/YYYY-MM-DD-description"/{papers,citations} ``` **Initialize files:** **Core files (always create these):** **papers-reviewed.json:** ```json {} ``` **citations/citation-graph.json:** ```json {} ``` **SUMMARY.md:** ```markdown # Research Query: [User's question] **Started:** YYYY-MM-DD HH:MM **Keywords:** keyword1, keyword2, keyword3 **Data types sought:** IC50 values, selectivity data, synthesis methods --- ## Highly Relevant Papers (Score β‰₯ 8) Papers scored using `evaluating-paper-relevance` skill: - Score 0-10 based on: Keywords (0-3) + Data type (0-4) + Specificity (0-3) - Score β‰₯ 8: Highly relevant with significant data - Score 7: Relevant with useful data - Score 5-6: Possibly relevant - Score < 5: Not relevant (Papers will be added here as found) Example format: ### [Paper Title](https://doi.org/10.1234/example) **DOI:** [10.1234/example](https://doi.org/10.1234/example) | **PMID:** [12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/) --- ## Relevant Papers (Score 7) (Papers will be added here as found) --- ## Possibly Relevant Papers (Score 5-6) (Noted for potential follow-up) --- ## Search Progress - Initial PubMed search: X results - Papers reviewed: Y - Papers with relevant data: Z - Citations followed: N --- ## Key Findings (Synthesized findings will be added as research progresses) ``` **CRITICAL: Always use clickable markdown links for DOIs and PMIDs** **Auxiliary files (for large searches >100 papers):** See `evaluating-paper-relevance` skill for guidance on when to create: - **README.md** - Project overview, methodology, file inventory - **TOP_PRIORITY_PAPERS.md** - Curated priority list organized by tier - **evaluated-papers.json** - Rich structured data for programmatic access For small searches (<50 papers), stick to core files only. For large searches (>100 papers), auxiliary files add significant organizational value. ### Phase 3: Search Literature **Use searching-literature skill:** 1. Construct PubMed query from keywords 2. Execute search (start with 100 results) 3. Save results to `initial-search-results.json` 4. Report: "πŸ”Ž Found N papers matching query" ### Phase 4: Evaluate Papers **Use evaluating-paper-relevance skill:** For each paper: 1. Check papers-reviewed.json (skip if already processed) 2. Stage 1: Score abstract (0-10) 3. If score β‰₯ 7: Stage 2 deep dive 4. Extract findings to SUMMARY.md 5. Download PDF and supplementary if available 6. **Update papers-reviewed.json (for ALL papers, even low-scoring ones)** 7. If score β‰₯ 7: proceed to Phase 5 for this paper **CRITICAL: Add every paper to papers-reviewed.json regardless of score. This prevents re-review and tracks complete search history.** **Report progress for EVERY paper:** ``` πŸ“„ [15/100] Screening: "Paper Title" Abstract score: 8 β†’ Fetching full text... βœ“ Found IC50 data for 8 compounds β†’ Added to SUMMARY.md πŸ“„ [16/100] Screening: "Another Paper" Abstract score: 3 β†’ Skipping (not relevant) πŸ“„ [17/100] Screening: "Third Paper" Abstract score: 7 β†’ Relevant, adding to queue... ``` **Every 10 papers, give summary update** ### Phase 5: Traverse Citations **Use traversing-citations skill:** For papers scoring β‰₯ 7: 1. Get references (backward) 2. Get citations (forward) 3. Filter for relevance (score β‰₯ 5) 4. Add to processing queue 5. Evaluate queued papers (return to Phase 4) **Report progress:** ``` πŸ”— Following citations from highly relevant paper β†’ Found 12 relevant references β†’ Found 8 relevant citing papers β†’ Adding 20 papers to queue ``` ### Phase 6: Checkpoint **Check after:** - Every 50 papers reviewed - Every 5 minutes of processing - Queue exhausted **Ask user:** ``` ⏸️ Checkpoint: Reviewed 50 papers, found 12 relevant Papers with data: 7 Continue searching? (y/n/summary) ``` **Options:** - `y` - Continue processing - `n` - Stop and finalize - `summary` - Show current findings, then decide ### Phase 7: Synthesize Findings **When stopping (user says no or queue empty):** **Option A: Manual synthesis (small research sessions)** 1. **Review SUMMARY.md** - Organize by relevance and topic 2. **Extract key findings** - Group by data type 3. **Add synthesis section:** ```markdown ## Key Findings Summary ### IC50 Values for BTK Inhibitors - Compound A: 12 nM (Smith et al., 2023) - Compound B: 45 nM (Doe et al., 2024) - [More compounds...] ### Selectivity Data - Compound A shows >80-fold selectivity vs other kinases - Tested against panel of 50 kinases (Jones et al., 2023) ### Synthesis Methods - Lead compounds synthesized via [method] - Yields: 30-45% - Full protocols in [papers] ### Gaps Identified - No data on selectivity vs [specific kinase] - Limited in vivo data - Few papers on resistance mechanisms ``` 4. **Update search progress stats** 5. **List all files downloaded** **Option B: Script-based synthesis (large research sessions >50 papers)** For large research sessions, consider creating a synthesis script: **create `generate_summary.py`:** - Read `evaluated-papers.json` from helper scripts - Aggregate findings by priority and scaffold type - Generate comprehensive SUMMARY.md with: - Executive summary with statistics - Papers grouped by relevance score - Priority recommendations for next steps - Methodology documentation - Include timestamps and reproducibility info **Benefits:** - Consistent formatting across sessions - Easy to regenerate as more papers added - Can customize grouping/filtering logic - Documents complete methodology **Final report:** ``` βœ… Research complete! πŸ“Š Summary: - Papers reviewed: 127 - Relevant papers: 18 - Highly relevant: 7 - Data extracted: IC50 values for 45 compounds, selectivity data, synthesis methods πŸ“ All findings in: research-sessions/2025-10-11-btk-inhibitor-selectivity/ - SUMMARY.md (organized findings) - papers/ (14 PDFs + supplementary data) - papers-reviewed.json (complete tracking) ``` ### Phase 8: Final Consolidation **CRITICAL: Always consolidate findings at the end** #### 1. Create relevant-papers.json **Filter papers-reviewed.json to extract only relevant papers (score β‰₯ 7):** ```python # Read papers-reviewed.json with open('papers-reviewed.json') as f: all_papers = json.load(f) # Filter for relevant papers (score >= 7) relevant_papers = { doi: data for doi, data in all_papers.items() if data.get('score', 0) >= 7 } # Save to relevant-papers.json with open('relevant-papers.json', 'w') as f: json.dump(relevant_papers, f, indent=2) ``` **Format:** ```json { "10.1234/example1.2023": { "pmid": "12345678", "title": "Paper title", "status": "highly_relevant", "score": 9, "source": "pubmed_search", "timestamp": "2025-10-11T16:00:00Z", "found_data": ["IC50 values", "synthesis methods"], "chembl_id": "CHEMBL1234567" }, "10.1234/example2.2023": { "pmid": "23456789", "title": "Another paper", "status": "relevant", "score": 7, "source": "forward_citation", "timestamp": "2025-10-11T16:15:00Z", "found_data": ["MIC data"] } } ``` #### 2. Enhance SUMMARY.md with Methodology Section **Add these sections to the TOP of existing SUMMARY.md (before paper listings):** ```markdown # Research Query: [User's question] **Date:** 2025-10-11 **Duration:** 2h 15m **Status:** Complete --- ## Search Strategy **Keywords:** BTK, Bruton tyrosine kinase, inhibitor, selectivity, off-target, kinase panel, IC50 **Data types sought:** IC50 values, selectivity data, kinase panel screening **Constraints:** None (open date range) **PubMed Query:** ``` ("BTK" OR "Bruton tyrosine kinase") AND (inhibitor OR "kinase inhibitor") AND (selectivity OR "off-target") ``` --- ## Screening Methodology **Rubric:** Abstract scoring (0-10) - Key terms: +3 pts each (or Keywords 0-3, Data type 0-4, Specificity 0-3 if using old rubric) - Relevant terms: +1 pt each - Threshold: β‰₯7 = relevant **Sources:** - Initial PubMed search - Forward/backward citations via Semantic Scholar --- ## Results Statistics **Papers Screened:** - Total reviewed: 127 papers - Highly relevant (β‰₯8): 12 papers - Relevant (7): 18 papers - Possibly relevant (5-6): 23 papers - Not relevant (<5): 74 papers **Data Extracted:** - IC50 values: 45 compounds across 12 papers - Selectivity data: 8 papers with kinase panel screening - Full text obtained: 18/30 relevant papers (60%) **Citation Traversal:** - Papers with citations followed: 7 - References screened: 45 papers - Citing papers screened: 38 papers - Relevant papers found via citations: 8 papers --- ## Key Findings Summary ### IC50 Values for BTK Inhibitors - Ibrutinib: 0.5 nM (Smith et al., 2023) - Acalabrutinib: 3 nM (Doe et al., 2024) - [Additional findings synthesized from papers below] ### Selectivity Patterns - Most inhibitors show >50-fold selectivity vs other kinases - Common off-targets: TEC, BMX (other TEC family kinases) ### Gaps Identified - Limited data on selectivity vs JAK/SYK - Few papers on resistance mechanisms - No in vivo selectivity data found --- ## File Inventory - `SUMMARY.md` - This file (methodology + findings) - `relevant-papers.json` - 30 relevant papers (score β‰₯7) - `papers-reviewed.json` - All 127 papers screened - `papers/` - 18 PDFs + 5 supplementary files - `citations/citation-graph.json` - Citation relationships --- ## Reproducibility **To reproduce:** 1. Use PubMed query above 2. Apply screening rubric (threshold β‰₯7) 3. Follow citations from highly relevant papers (β‰₯8) 4. Check Unpaywall for paywalled papers **Software:** Research Superpowers skills v2025-10-11 --- [Existing paper listings follow below...] ## Highly Relevant Papers (Score β‰₯ 8) ### [Paper Title]... ``` **Report to user:** ``` βœ… Research session complete! πŸ“„ Consolidation complete: 1. SUMMARY.md - Enhanced with methodology, statistics, and findings 2. relevant-papers.json - 30 relevant papers (score β‰₯7) in JSON format πŸ“ All files in: research-sessions/2025-10-11-btk-inhibitor-selectivity/ - SUMMARY.md (complete: methodology + paper-by-paper findings) - relevant-papers.json (30 relevant papers for programmatic access) - papers-reviewed.json (127 total papers screened) - papers/ (18 PDFs) πŸ” Quick access: - Open SUMMARY.md for complete findings and methodology - Use relevant-papers.json for programmatic access πŸ’‘ Optional: Clean up intermediate files? β†’ Use cleaning-up-research-sessions skill to safely remove temporary files ``` ## Workflow Checklist **Use TodoWrite to track these steps:** - [ ] Parse user query (keywords, data types, constraints) - [ ] Propose and create research folder - [ ] Initialize tracking files (SUMMARY.md, papers-reviewed.json, citation-graph.json) - [ ] Search PubMed using searching-literature skill - [ ] For each paper: evaluate using evaluating-paper-relevance skill - [ ] For relevant papers (β‰₯7): traverse citations using traversing-citations skill - [ ] Report progress regularly - [ ] Checkpoint every 50 papers or 5 minutes - [ ] When done: synthesize findings and enhance SUMMARY.md with methodology - [ ] Create relevant-papers.json (filtered JSON for programmatic access) - [ ] Final report with stats and file locations ## Integration Points **Skills used:** 1. `searching-literature` - Initial PubMed search 2. `evaluating-paper-relevance` - Score and extract from papers 3. `traversing-citations` - Follow citation networks **All skills coordinate through:** - Shared `papers-reviewed.json` (deduplication) - Shared `SUMMARY.md` (findings accumulation) - Shared `citation-graph.json` (relationship tracking) **File organization:** - **Small searches (<50 papers):** Core files only (papers-reviewed.json, SUMMARY.md, citation-graph.json) - **All searches:** Create relevant-papers.json at end; enhance SUMMARY.md with methodology - **Large searches (>100 papers):** May add auxiliary files (README.md, TOP_PRIORITY_PAPERS.md, evaluated-papers.json) for better organization ## Error Handling **No results found:** - Try broader keywords - Remove constraints - Check spelling - Try different synonyms **API rate limiting:** - Report to user: "⏸️ Rate limited, waiting..." - Wait required time - Resume automatically **Full text unavailable:** - Note in SUMMARY.md - Continue with abstract-only evaluation - Flag for manual retrieval if highly relevant **Too many results (>500):** - Suggest narrowing query - Process first 100, ask if continue - Focus on most recent or most cited ## Quick Reference | Phase | Skill | Output | |-------|-------|--------| | Parse | (built-in) | Keywords, data types, constraints | | Initialize | (built-in) | Folder, SUMMARY.md, tracking files | | Search | searching-literature | List of papers with metadata | | Evaluate | evaluating-paper-relevance | Scored papers, extracted findings | | Traverse | traversing-citations | Additional papers from citations | | Synthesize | (built-in) | Enhanced SUMMARY.md with methodology + findings | | Consolidate | (built-in) | relevant-papers.json (filtered to score β‰₯7) | ## Common Mistakes **Not tracking all papers:** Only adding relevant papers to papers-reviewed.json β†’ Add EVERY paper to prevent re-review, track complete history **Creating unnecessary auxiliary files for small searches:** For <50 papers, stick to core files (papers-reviewed.json, SUMMARY.md, citation-graph.json). For large searches (>100 papers), auxiliary files like README.md and TOP_PRIORITY_PAPERS.md add value. **Silent work:** User can't see progress β†’ Report EVERY paper, give updates every 10 **Non-clickable identifiers:** Plain text DOIs/PMIDs β†’ Always use markdown links **Jumping to evaluation without good search:** Too narrow results β†’ Optimize search first **Not tracking papers:** Re-reviewing same papers β†’ Always use papers-reviewed.json **Following all citations:** Exponential explosion β†’ Filter before traversing **No checkpoints:** User loses context β†’ Report and ask every 50 papers **Poor synthesis:** Just list papers β†’ Group by data type, extract key findings **Batch reporting:** Reporting 20 papers at once β†’ Report each one as you go ## User Communication (CRITICAL) **NEVER work silently! User needs continuous feedback.** **Report frequency:** - **Every paper:** Brief status as you screen (`πŸ“„ [N/Total] Title... Score: X`) - **Every 5-10 papers:** Progress summary with counts - **Every finding:** Immediately report what data you found - **Every decision point:** Ask before changing direction **Be specific in progress reports:** - βœ… "Found IC50 = 12 nM for compound 7 (Table 2)" - ❌ "Found data" - βœ… "Screening paper 25/127: Not relevant (score 3)" - ❌ Silently skip papers **Ask for clarification when needed:** - βœ… "Are you looking for in vitro or in vivo IC50 values?" - ❌ Assume and potentially waste time **Report blockers immediately:** - βœ… "⚠️ Paper behind paywall - evaluating from abstract only" - ❌ Silently skip without mentioning **Periodic summaries (every 10-15 papers):** ``` πŸ“Š Progress update: - Reviewed: 30/127 papers - Highly relevant: 3 (scores 8-10) - Relevant: 5 (score 7) - Currently: Screening paper 31... ``` **Why:** User can course-correct early, knows work is happening, can stop if needed ## Success Criteria Research session successful when: - All relevant papers found and evaluated - Specific data extracted and organized - Citations followed systematically - No duplicate processing - Clear SUMMARY.md with actionable findings - User questions answered with evidence ## Next Steps After completing research: - User reviews SUMMARY.md and relevant-papers.json - **Optional**: Run cleaning-up-research-sessions skill to remove intermediate files - May request deeper dive into specific papers - May request follow-up searches with refined keywords - May archive or share research session folder