--- name: "pubmed-database" description: "PubMed Database workflow skill. Use this skill when the user needs direct REST API access to PubMed. Advanced Boolean and MeSH queries, E-utilities API, batch processing, and citation-oriented retrieval. For Python workflows, prefer Biopython (Bio.Entrez). Use this skill for direct HTTP/REST work or custom API implementations, and preserve upstream workflow intent, copied support files, and provenance before handoff." version: "0.0.1" category: "backend" tags: - "pubmed-database" - "direct" - "rest" - "api" - "access" - "pubmed" - "advanced" - "boolean" - "omni-enhanced" complexity: "advanced" risk: "caution" tools: - "codex-cli" - "claude-code" - "cursor" - "gemini-cli" - "opencode" source: "omni-team" author: "Omni Skills Team" date_added: "2026-04-15" date_updated: "2026-04-19" source_type: "omni-curated" maintainer: "Omni Skills Team" family_id: "pubmed-database" family_name: "PubMed Database" variant_id: "omni" variant_label: "Omni Curated" is_default_variant: true derived_from: "skills/pubmed-database" upstream_skill: "skills/pubmed-database" upstream_author: "sickn33" upstream_source: "community" upstream_pr: "79" upstream_head_repo: "diegosouzapw/awesome-omni-skills" upstream_head_sha: "6bf093920a93e68fa8263cf6ee767d7407989d56" curation_surface: "skills_omni" enhanced_origin: "omni-skills-private" source_repo: "diegosouzapw/awesome-omni-skills" replaces: - "pubmed-database" --- # PubMed Database ## Overview This skill supports direct, official programmatic access to PubMed through NCBI Entrez E-utilities. Use it when the task needs reproducible biomedical literature retrieval, fielded or MeSH-aware searching, batch export, citation-oriented extraction, or a custom integration that should stay aligned with official PubMed and NCBI behavior. Keep the original upstream intent intact: this skill exists for direct REST access and custom workflows. For Python implementations, prefer `Bio.Entrez` as a client wrapper, but design and verify the workflow in terms of the underlying E-utilities semantics first. Do **not** fall back to scraping PubMed HTML pages when E-utilities already expose the needed data. ## When to Use This Skill Activate this skill when one or more of these are true: - You need **direct HTTP access** to PubMed or Entrez E-utilities. - You need a **repeatable, auditable search strategy** rather than an ad hoc UI search. - You must construct **advanced Boolean, field-tagged, date-bounded, publication-type, or MeSH-informed queries**. - You need to **retrieve many records in batches** without manually copying PMIDs. - You need to compare **ESearch, ESummary, EFetch, and ELink** for the same workflow. - You are building a custom integration for **citation metadata, abstracts, identifiers, or related-record lookup**. - You must verify how PubMed interpreted a query before exporting or analyzing results. Do **not** use this skill as the first choice when: - The user only needs a quick manual literature search in the PubMed web UI. - The task is purely Python automation and a higher-level client already covers the needed behavior; in that case, still use this skill for query design and API semantics, but implement with `Bio.Entrez`. - The task requires unsupported data collection patterns such as HTML scraping or aggressive harvesting. ## Operating Table | Situation | Start here | Why it matters | | --- | --- | --- | | Choosing the right E-utility | `references/integration-patterns.md` | Helps decide between ESearch, ESummary, EFetch, and ELink before building requests | | Designing a reproducible query | `examples/request-response-example.md` | Shows fielded search, translation checks, and history-server usage with concrete request patterns | | Mapping output fields for extraction | `assets/schema-map.json` | Gives a compact machine-readable map for common citation, abstract, journal, and identifier extraction goals | | First production call | This `SKILL.md` | Establishes safe request structure, identification, batching, and troubleshooting | | Python implementation | This `SKILL.md`, then `examples/request-response-example.md` | Keeps REST semantics primary, then shows a Bio.Entrez equivalent without changing policy obligations | ## Workflow ### 1. Define the retrieval target Clarify all of the following before making requests: - Research question or operational objective - Concepts, synonyms, abbreviations, and likely spelling variants - Required filters such as date range, language, species, publication type, or journal - Output need: counts only, lightweight summaries, full structured records, or related links - Expected volume: a few records, hundreds, or a large result set needing pagination and checkpointing For evidence-sensitive work such as systematic review support, combine **controlled vocabulary** and **free-text terms** deliberately instead of assuming one will fully cover the concept. ### 2. Build the query explicitly Construct the query with field tags and Boolean logic instead of relying on vague free text. Common patterns include: - Title/abstract terms for recent phrasing: `term[Title/Abstract]` - Author lookup: `Surname Initials[Author]` - Journal restriction: `Journal Name[Journal]` - Publication type: `randomized controlled trial[Publication Type]` - Date restrictions with Entrez date parameters or explicit query clauses - MeSH-driven concept expansion, often paired with free-text synonyms Good practice: - Quote phrases only when you want a phrase-level constraint. - Use parentheses around concept groups. - Keep a logged copy of the exact submitted query string. - For recall-sensitive searches, pair MeSH with keyword synonyms rather than treating them as interchangeable. ### 3. Run `ESearch` first and inspect interpretation Use `ESearch` to verify whether PubMed interpreted the query as intended. Minimum concerns to verify: - `Count` - Returned identifiers for a small first page - Search interpretation or translation details when available - Whether the query is unexpectedly broad or narrow Do this **before** launching large exports. If the result set is non-trivial, prefer `usehistory=y` so downstream calls can reference `WebEnv` and `query_key` instead of copying large PMID lists through every step. ### 4. Decide the downstream retrieval utility Choose the next step based on the actual output need: - **ESearch**: find PMIDs, counts, and search interpretation - **ESummary**: lightweight metadata review, screening support, fast record summaries - **EFetch**: richer record retrieval for structured extraction, abstracts, identifiers, and detailed citation fields - **ELink**: related-record, citation-link, or cross-database relationships when available Do not assume `ESummary` and `EFetch` contain the same fields. ### 5. Batch safely for larger result sets For larger jobs: 1. Call `ESearch` with `usehistory=y` 2. Capture and log `Count`, `WebEnv`, and `query_key` 3. Page through records with `retstart` and `retmax` 4. Retrieve with `ESummary` or `EFetch` in bounded batches 5. Log progress after each batch 6. Check cumulative retrieved records against expected count Operational guardrails: - Use respectful pacing and bounded retries. - Provide identifying request metadata such as tool and email as required by NCBI guidance. - If using an API key, configure it explicitly rather than assuming higher throughput automatically applies. - Do not make a single oversized request when a history-backed paginated workflow is safer. - Checkpoint enough state to resume after interruption. Recommended audit fields per batch: - Query string - Utility used - `retstart` - `retmax` - Cumulative records written - `WebEnv` and `query_key` when using the history server - Timestamp and any retry events ### 6. Prefer machine-parseable formats for extraction When building parsers or downstream transforms: - Prefer structured formats such as XML when field reliability matters. - Use `ESummary` only for summary-oriented metadata needs. - Use `EFetch` when you need richer record content. - Validate the requested `retmode` and `rettype` against the utility and extraction goal. Use `assets/schema-map.json` as a compact reference for common extraction targets, but treat official NLM field documentation as authoritative for final interpretation. ### 7. Verify before analysis or handoff Before handing results to another step or another operator, verify: - The query returned the expected conceptual scope - The total count was understood correctly - Pagination covered the intended result set - The chosen utility and format actually contain the required fields - Any Bio.Entrez implementation matches the REST behavior for the same search or PMIDs ## Troubleshooting ### Problem: Unexpectedly broad or narrow results Check: - Whether PubMed automatic term mapping changed the meaning of the query - Whether phrase quoting is too restrictive or too loose - Whether field tags were omitted or applied to the wrong clause - Whether MeSH terms, explosion behavior, or free-text synonyms are mismatched - Whether date or publication-type filters are suppressing expected records Action: - Re-run a small `ESearch` - Inspect translation behavior - Compare a fielded query against a simpler baseline - Log the exact before/after query strings ### Problem: Only the first page was retrieved Cause: - `retstart` / `retmax` pagination was not implemented, or `usehistory=y` was omitted for larger retrievals. Action: - Repeat `ESearch` with history enabled - Capture `WebEnv` and `query_key` - Page explicitly and reconcile total retrieved vs `Count` ### Problem: Missing fields in the response Cause: - The selected utility or format does not expose the needed field. Action: - Compare `ESummary` versus `EFetch` - Verify `retmode` and `rettype` - Check `assets/schema-map.json` for common expectations - Confirm field semantics in official NLM documentation before changing the parser ### Problem: HTTP 429, temporary blocks, or unstable responses Action: - Slow down request rate and reduce concurrency - Add bounded backoff and retry with logging - Confirm identifying metadata and API key configuration - Prefer scheduled, history-based batch retrieval over bursty repeated searches - Review current official NCBI usage guidance instead of assuming a fixed limit from memory ### Problem: `ELink` results look incomplete Cause: - Link coverage depends on the selected link name and on NCBI data availability. Action: - Verify the exact `linkname` - Treat returned relationships as availability-dependent, not guaranteed complete citation coverage - Record which link type was used in downstream outputs ### Problem: REST and Bio.Entrez outputs do not match Action: - Compare the exact database, utility, query, and format parameters - Ensure both paths use the same IDs or the same history state - Confirm parsing assumptions rather than assuming the client wrapper changed PubMed behavior ## Additional Resources - `references/integration-patterns.md` for utility selection, history-server decisions, batching, and output-format notes - `examples/request-response-example.md` for concrete REST requests, expected response elements, and a Bio.Entrez equivalent - Official NCBI Entrez Programming Utilities Help: `https://www.ncbi.nlm.nih.gov/books/NBK25501/` - Official PubMed User Guide: `https://pubmed.ncbi.nlm.nih.gov/help/` - MeSH reference: `https://www.ncbi.nlm.nih.gov/mesh` - Biopython Entrez tutorial: `https://biopython.org/docs/latest/Tutorial/chapter_entrez.html` - Biopython `Bio.Entrez` API reference: `https://biopython.org/docs/latest/api/Bio.Entrez.html` - NLM MEDLINE/PubMed field descriptions: `https://www.nlm.nih.gov/bsd/mms/medlineelements.html` ## Related Skills Use a neighboring skill instead when the task drifts into: - generic literature review planning without direct API work - citation formatting only, without PubMed retrieval design - Python-only implementation details that do not require direct REST workflow reasoning - broader biomedical database comparison beyond PubMed and Entrez ## Notes on Upstream Intent and Provenance This enhanced candidate preserves the upstream skill identity and scope: direct PubMed access, advanced query construction, E-utilities use, batch processing, and citation-oriented retrieval. The wording has been rewritten into an operator-focused playbook so the skill is safer and more executable without changing its core purpose.