--- name: research version: 2026.05.05 effort: medium triggers: arxiv|paper|research paper|citation|SOTA|prior art|doi|literature|literature review|preprint|study|publication|h-index|scholarly|academic|semantic scholar|openalex|pubmed|europepmc|crossref|dblp|unpaywall|internet archive|oreilly|find papers|who cites|survey of|novel claim|orcid|ror description: > Academic / literature research orchestrator. Routes intent -> best free tool first (arxiv, semantic_scholar, openalex, crossref, europepmc, pubmed, unpaywall, orcid, ror, dblp, openlibrary, internet_archive_search, oreilly_search), escalates to paid only when free < 3 hits. Auto-picks for "find papers on X", "prior art", "SOTA map", "who cites Y", "OA PDF for DOI", "author disambiguation", "literature review". NEVER WebFetch. composes_with: autoresearch, elite-loop, "!:discover" --- # Research Skill — Academic & Literature Orchestrator > Replaces the 1-line `Academic: context7 | arXiv` routing in `core-minimal.md` > with a decision matrix and tested chain patterns. > All tools listed are **FREE** unless marked `$`. All via `mcp__gateway__gateway_execute`, > server prefix `fulcrum:`. ## Prime directive ``` free-first -> escalate only when free<3 hits | NEVER WebFetch (~50K/call waste) <3 hits on free -> try exa_search($0.005, semantic) <3 hits on exa -> parallel_search($0.004, multi-hop) still nothing -> widen query / drop filters before paying more ``` ## Research-type -> best-tool matrix (V evidence tested 2026-05-05) | # | Intent | Primary (FREE) | Secondary | Why | Conf | | --- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ---- | | 1 | Fresh preprints (last 6mo ML/CS) | `arxiv_search` sortBy=submittedDate | `semantic_scholar` year=2025-2026 | arxiv indexes preprints day-0; S2 lags ~weeks | V | | 2 | Citation graph / influence | `semantic_scholar` fields=citationCount | `openalex_search_works` | S2 returns clean citationCount inline | V | | 3 | Specific author's works | `openalex_get_author` | `semantic_scholar` + name | OpenAlex returns h-index + works_count + ORCID in one call; S2 needs paper->author hop | I | | 4 | OA PDF for a DOI | `unpaywall_find_open` | `europepmc_search` (for biomed) | Unpaywall is THE canonical "is this free?" DB (~50M); does NOT index arXiv DOIs | V | | 5 | Biomedical / clinical | `europepmc_search` | `pubmed_search`+`pubmed_fetch_article` | EPMC returns hasPDF/inEPMC/OA inline; PubMed only gives PMID list | V | | 6 | CS-specific (established work) | `semantic_scholar`+`arxiv_search` | `dblp_search` | DBLP only indexes formally-published venues (0 hits on "Hebbian LLM memory" 2026-04-24); trap for fresh queries | V | | 7 | Math-specific | `arxiv_search` cat:math.\* | `crossref_search_works` | arXiv is the math preprint standard; crossref for published journals | I | | 8 | Historical / archived / pre-arXiv terminology | `internet_archive_search` | `openlibrary_search` (books) / `crossref_search_works` (DOI lineage) | IA finds archived texts, old arXiv mirrors, proceedings, patents, and pre-current-keyword material; OpenLibrary is book metadata only | V | | 9 | Multi-hop "cites X AND disputes Y" | `parallel_search` ($0.004) | `exa_search` type=neural | parallel_search built for chained multi-hop reasoning | I | | 10 | Prior art (patent-claim grade) | chain: `arxiv_search` + `semantic_scholar` + `internet_archive_search` + `exa_search` | `parallel_search` | No single source; must triangulate preprint+published+archived web/text | I | | 11 | SOTA map of a whole field | chain: `arxiv_search` (fresh) + `semantic_scholar` (influential, top-k by citations) + `openalex_search_works` (concept breadth) | `exa_search` fill gaps | 3 axes: recent / influential / conceptual cluster | V | | 12 | Novel-claim validation (cross-discipline) | `openalex_search_works` (concepts graph) | `semantic_scholar` | OpenAlex concepts span 250M works + cross-field topics | I | | 13 | Author disambiguation / institution | `orcid_get_person` + `ror_search_organization` | `openalex_get_author` | ORCID=canonical ID, ROR=canonical institution | I | | 14 | Production engineering pattern / practitioner implementation | `oreilly_search` | `oreilly_books_search` for book-level bibliography | O'Reilly surfaces book/chapter-level implementation guidance and operator patterns that paper indexes often omit | V | Confidence: V=tested today with real query | I=schema-level argument | A=assumption. ## 3 chain patterns (Rams #10 — less but better) Pick one of these by intent. Default is pattern A. ### A. "Find papers on X" / survey (DEFAULT for 90% of asks) ``` 1. fulcrum:arxiv_search {search_query: "all:X AND all:Y", sortBy: "submittedDate", max_results: 10} 2. fulcrum:semantic_scholar {query: "X Y", limit: 10, fields: "title,authors,year,citationCount,externalIds,abstract"} 3. Merge: dedupe by DOI/ArXiv ID. Sort by (year desc, citationCount desc). 4. If historical, pre-arXiv, old terminology, or prior-art adjacent: fulcrum:internet_archive_search {q: "mediatype:texts AND (\"X\" OR \"Y\")", rows: 10, fl: "identifier,title,creator,date,mediatype,downloads,subject", sort: "downloads desc"} 5. If the ask includes production engineering, agent/product implementation, operational memory, deployment, eval, or system-design practice: fulcrum:oreilly_search {query: "X Y", formats: "book", limit: 10, sort: "relevance"} 6. Skip O'Reilly for pure citation counting, DOI/OA lookup, author disambiguation, or biomed-only searches; skip IA for fresh-only queries with a tight recent-year filter. 7. If free scholarly + IA/O'Reilly fanout still returns < 3 useful hits: fulcrum:exa_search {query: "...", category: "research_paper", num_results: 5} ``` ### B. "Prior art" / SOTA map ``` 1. A + openalex_search_works {search: "X", per_page: 10, filter: "publication_year:2023-2026", select: "id,title,publication_year,doi,cited_by_count,authorships"} (always pass 'select' — default response is ~35KB/result from inverted-index abstract) 2. For top-5 influential: semantic_scholar citations graph (paperId -> references + citations). 3. If patent-grade: add internet_archive_search for archived/pre-current-keyword evidence before paid web search. 4. If still under-covered: add parallel_search for non-paper web evidence. ``` ### C. "Who is author X" / "I have DOI Y" ``` Author: orcid_get_person || openalex_get_author DOI->OA: unpaywall_find_open (NOT for arXiv DOIs — 404) DOI->md: crossref_get_doi ``` ## Parameter gotchas (save 5-10 rediscovery calls) ``` arxiv_search -> search_query (NOT query), sortBy, sortOrder, max_results dblp_search -> q (NOT query), h (NOT limit) openalex_search_works -> search (NOT query), per_page, filter, select (USE IT) pubmed_search -> term (NOT query), retmax, sort semantic_scholar -> query, limit, fields, year europepmc_search -> query, pageSize, cursorMark crossref_search_works -> query, rows, filter exa_search -> query, category (research_paper), num_results, type internet_archive_search -> q (Lucene), rows, fl, sort; use mediatype:texts for papers/books/proceedings oreilly_search -> query, formats, limit, page, sort; use formats=book for book/chapter results oreilly_books_search -> query, limit, page; use for whole-book bibliography/ISBN, not narrow chapter evidence unpaywall_find_open -> doi, email REQUIRED; use parm@iki.fi; fails on 10.48550/arXiv.* (use publisher DOI) openalex: pass `select: "id,title,publication_year,doi,cited_by_count,authorships"` or response is ~35KB/hit ``` ## Output format (citation-ready) One line per hit. Year desc. Drop duplicates by DOI/ArXiv ID. ``` — <Authors> (<Year>) arXiv:<id> | doi:<doi> | cites:<n> ``` Example: ``` Hebbian Memory-Augmented Recurrent Networks — Szelogowski (2025) arXiv:2507.21474 | doi:10.48550/arXiv.2507.21474 | cites:2 HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents — Zhu, Li, Zhang et al. (2026) arXiv:2604.16839 | ACL 2026 | cites:0 ``` ## Cost ceiling rules ``` FREE budget -> unlimited (arxiv, S2, openalex, crossref, EPMC, pubmed, unpaywall, orcid, ror, dblp, openlibrary, internet_archive_search, oreilly_search/oreilly_books_search) $0.005 exa -> OK if free returned <3 hits OR semantic query needs neural $0.004 parallel -> OK only for multi-hop chains (pattern B with patent-grade ask) $0.01 tavily -> last resort only WebFetch -> NEVER (~50K tokens = $0.25 equivalent in context) ``` ## Test evidence (2026-04-24, query: "Hebbian learning LLM memory") | Tool | Hits | Top result | Cost | | ---------------------------------------------------------------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- | | arxiv_search (sort=submittedDate) | 2 | HeLa-Mem (2026-04-18, ACL 2026) | FREE | | semantic_scholar | 11,990 | Hebbian Memory-Augmented RNN (2025, cites:2) | FREE | | openalex_search_works (2024-26) | 261 | HeLa-Mem | FREE but 113KB payload — USE `select` | | crossref_search_works | 3,080,234 | Short-term Hebbian ~ transformer attention (Ellwood 2023, PLoS) | FREE | | europepmc_search | 5,553 | Anti-Hebbian replay (Hippocampus 2026) — all biomed | FREE | | pubmed_search | 539 | biomed-only, PMID list only | FREE | | dblp_search | **0** | — (DBLP too conservative for fresh preprint queries) | FREE | | exa_search category=research_paper | 2 | HeLa-Mem arxiv abs URL | $0.007 | | unpaywall_find_open (arXiv DOI) | 404 | — (doesn't index arXiv DOIs) | FREE | | unpaywall_find_open (publisher DOI 10.1371/journal.pcbi.1011843) | gold OA | PLoS PDF + PMC mirror + DOAJ | FREE | | internet_archive_search mediatype:texts | 93 | archived patents/proceedings and older neural-memory/Hebbian-learning texts, e.g. dHAN associative thought (2007) and visual-cortex Hebbian learning (2005) | FREE | | oreilly_search `"LLM memory" OR "AI agents memory"` | 28 | AI Agents and Applications ch.14 "Productionizing AI agents: Memory, guardrails, and beyond"; Hands-On GenAI appendix "LLM Memory Requirements" | FREE | **Winning combo for this query**: Pattern A core (`arxiv_search` + `semantic_scholar`) plus conditional free fanout. `internet_archive_search` adds older/pre-current-keyword and patent/proceedings context; `oreilly_search` adds practitioner LLM/agent-memory chapters. No paid escalation needed when these free fanouts produce useful context. dblp was 0-hit noise; crossref found a 2023 paper the others missed via keyword ranking so it's worth as a 3rd cross-check for survey/prior-art. ## Composition - Metric-hill-climb of a target file -> use `autoresearch` skill (different tool). - Debate/multi-perspective decisions -> use `swarm-templates/research-council`. - Autonomous discovery loop -> use `/!:discover` or `/!:autoresearch` (slash command). - This skill = **literature search orchestration only**. ## Quick recipes ``` # "SOTA map of hebbian memory" -> Pattern B with year=2023-2026, top-10 by (year, cites) # "Find papers by Hinton on forward-forward" -> openalex_get_author "Geoffrey Hinton" -> filter works by title contains "forward-forward" OR semantic_scholar query "forward-forward Hinton" # "Is DOI 10.1038/nature12373 open access?" -> unpaywall_find_open doi=10.1038/nature12373 email=parm@iki.fi (returns best_oa_location.url_for_pdf if OA) # "Recent CS papers about X (last 30 days)" -> arxiv_search search_query="all:X AND cat:cs.*" sortBy=submittedDate max_results=20 ``` **2026.05.05** | supersedes `Academic: context7 | arXiv` line in core-minimal.md