--- name: "dbsnp-database" description: "Query NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieve alleles, MAF, variant class (SNV/indel/MNV), clinical links, cross-DB IDs (ClinVar, dbVar, 1000G). Free; 3 req/sec (10 with key). For clinical pathogenicity use clinvar-database; for population frequencies use gnomad-database." license: "CC0-1.0" --- # dbSNP Database ## Overview NCBI dbSNP is the primary public repository for short human genetic variants, cataloguing over 1 billion SNPs, indels, and MNVs with allele frequencies, functional annotations, and cross-references to ClinVar, gnomAD, and 1000 Genomes. Variants are identified by stable rsIDs (reference SNP cluster IDs). Access is free via two APIs: the legacy NCBI E-utilities and the newer NCBI Variation Services REST API, which returns structured JSON. ## When to Use - Looking up allele frequencies and variant class for a known rsID - Searching all dbSNP variants in a gene or chromosomal region by name or coordinates - Resolving rsIDs to genomic coordinates (GRCh38/GRCh37) and HGVS notation - Checking whether a variant of interest has clinical significance links to ClinVar entries - Batch-fetching hundreds of rsIDs efficiently using epost+efetch history server - Cross-referencing a list of variant positions to dbSNP rsIDs for downstream annotation - For clinical pathogenicity classifications use `clinvar-database`; dbSNP provides IDs and frequency but not curated clinical significance - For population frequency stratified by ancestry use `gnomad-database`; dbSNP MAF is a single aggregate frequency ## Prerequisites - **Python packages**: `requests`, `pandas`, `matplotlib`, `xml.etree.ElementTree` (stdlib) - **Data requirements**: rsIDs (`rs80357906`), gene symbols, or chromosomal coordinates - **Environment**: internet connection; NCBI Entrez email required for E-utilities (set `email` parameter) - **Rate limits**: 3 requests/second without API key; 10 requests/second with free NCBI API key. Register at https://www.ncbi.nlm.nih.gov/account/ — add `&api_key=YOUR_KEY` to all requests ```bash pip install requests pandas matplotlib # xml.etree.ElementTree is part of Python stdlib — no additional install needed ``` ## Quick Start ```python import requests import json EMAIL = "your@email.com" # required by NCBI policy BASE_EUTILS = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" BASE_VARIATION = "https://api.ncbi.nlm.nih.gov/variation/v0" def fetch_snp_by_rsid(rsid: str) -> dict: """Fetch a dbSNP record by rsID using the NCBI Variation Services API (structured JSON).""" rs_num = str(rsid).lstrip("rs") r = requests.get(f"{BASE_VARIATION}/refsnp/{rs_num}", timeout=15) r.raise_for_status() return r.json() record = fetch_snp_by_rsid("rs80357906") print(f"rsID: rs{record['refsnp_id']}") print(f"Organism: {record.get('organism', {}).get('common_name')}") print(f"SNP class: {record.get('primary_snapshot_data', {}).get('variant_type')}") # rsID: rs80357906 # Organism: human # SNP class: snv ``` ## Core API ### Query 1: rsID Lookup via E-utilities Fetch the full SNP record for a single rsID using efetch with `db=snp`. Returns an XML document with alleles, placements, and frequency data. ```python import requests import xml.etree.ElementTree as ET EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def efetch_snp_xml(rsid: str) -> ET.Element: """Fetch dbSNP XML record for a single rsID.""" rs_num = str(rsid).lstrip("rs") r = requests.get(f"{BASE}/efetch.fcgi", params={"db": "snp", "id": rs_num, "rettype": "xml", "retmode": "xml", "email": EMAIL}, timeout=20) r.raise_for_status() return ET.fromstring(r.text) root = efetch_snp_xml("rs80357906") # Parse allele placements from the XML for docsum in root.iter("DocumentSummary"): rs_id = docsum.get("uid") snp_class = docsum.findtext("SNP_CLASS", "Unknown") maf = docsum.findtext("MAF", "N/A") maf_allele = docsum.findtext("MAFALLELE", "N/A") chr_pos = docsum.findtext("CHRPOS", "N/A") gene = docsum.findtext("GENES/GENE_E/NAME", "N/A") clin_sig = docsum.findtext("CLINICAL_SIGNIFICANCE", "N/A") print(f"rs{rs_id} | Class: {snp_class} | MAF: {maf} ({maf_allele})") print(f" Position: {chr_pos} | Gene: {gene} | ClinSig: {clin_sig}") ``` ```python # Fetch using ESummary for structured JSON (preferred for batch) def esummary_snp(rsid: str) -> dict: rs_num = str(rsid).lstrip("rs") r = requests.get(f"{BASE}/esummary.fcgi", params={"db": "snp", "id": rs_num, "retmode": "json", "email": EMAIL}, timeout=15) r.raise_for_status() result = r.json()["result"] return result.get(rs_num, {}) rec = esummary_snp("rs80357906") print(f"rs{rec.get('snp_id')}:") print(f" Class : {rec.get('snp_class')}") print(f" MAF : {rec.get('maf')} ({rec.get('mafallele')})") print(f" ChrPos : {rec.get('chrpos')}") print(f" ClinSig : {rec.get('clinical_significance')}") print(f" FxnClass : {rec.get('fxn_class')}") ``` ### Query 2: Gene Variant Search Search dbSNP for all variants in a gene using esearch. Returns a list of rsIDs matching the gene. ```python import requests EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def esearch_snp(query: str, retmax: int = 100) -> tuple[list, int]: """Search dbSNP using a query string. Returns (id_list, total_count).""" r = requests.get(f"{BASE}/esearch.fcgi", params={"db": "snp", "term": query, "retmax": retmax, "retmode": "json", "email": EMAIL}, timeout=15) r.raise_for_status() result = r.json()["esearchresult"] return result["idlist"], int(result["count"]) # All variants in BRCA1 ids, total = esearch_snp("BRCA1[gene] AND human[orgn]", retmax=20) print(f"BRCA1 variants in dbSNP: {total:,} total") print(f"First 5 rsIDs: {['rs' + i for i in ids[:5]]}") # Only clinical variants (linked to ClinVar) ids_clin, total_clin = esearch_snp( "BRCA1[gene] AND human[orgn] AND clinsig[filter]", retmax=50) print(f"BRCA1 variants with clinical significance: {total_clin:,}") ``` ### Query 3: Chromosomal Region Search Search for all variants in a genomic region using chromosome coordinates. ```python import requests EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def search_region(chrom: str, start: int, stop: int, assembly: str = "GRCh38", retmax: int = 200) -> tuple[list, int]: """Find all dbSNP variants in a chromosomal region.""" query = f"{chrom}[CHR] AND {start}:{stop}[CHRPOS37]" if assembly == "GRCh37" else \ f"{chrom}[CHR] AND {start}:{stop}[CHRPOS]" r = requests.get(f"{BASE}/esearch.fcgi", params={"db": "snp", "term": query, "retmax": retmax, "retmode": "json", "email": EMAIL}, timeout=20) r.raise_for_status() result = r.json()["esearchresult"] return result["idlist"], int(result["count"]) # PCSK9 exon 4 region (GRCh38) ids, total = search_region("1", 55039700, 55040200) print(f"Variants in chr1:55039700-55040200: {total:,} total") print(f"Retrieved {len(ids)} rsIDs: {['rs' + i for i in ids[:5]]}") ``` ### Query 4: Variant Summary — MAF, Alleles, Clinical Significance Retrieve structured summary data for variant records using ESummary, extracting MAF, alleles, and database cross-links. ```python import requests, json EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def fetch_snp_summaries(rsids: list) -> dict: """Fetch ESummary records for a list of rsIDs. Returns dict keyed by rs number.""" ids_str = ",".join(str(r).lstrip("rs") for r in rsids) r = requests.post(f"{BASE}/esummary.fcgi", data={"db": "snp", "id": ids_str, "retmode": "json", "email": EMAIL}, timeout=20) r.raise_for_status() return r.json()["result"] rsids = ["rs80357906", "rs80357220", "rs28897672", "rs1801133"] result = fetch_snp_summaries(rsids) for rs_num in rsids: uid = str(rs_num).lstrip("rs") rec = result.get(uid, {}) print(f"\n{rs_num}:") print(f" Class : {rec.get('snp_class', 'N/A')}") print(f" MAF : {rec.get('maf', 'N/A')} allele={rec.get('mafallele', 'N/A')}") print(f" Location : {rec.get('chrpos', 'N/A')}") print(f" ClinSig : {rec.get('clinical_significance', 'N/A')}") print(f" Function : {rec.get('fxn_class', 'N/A')}") ``` ### Query 5: Batch rsID Query with EPost+EFetch Efficiently upload hundreds of rsIDs to the NCBI history server using EPost, then retrieve them in batches with EFetch. ```python import requests, time, pandas as pd EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def epost_ids(id_list: list) -> tuple[str, str]: """Upload rsIDs to NCBI history server. Returns (WebEnv, query_key).""" ids_str = ",".join(str(i).lstrip("rs") for i in id_list) r = requests.post(f"{BASE}/epost.fcgi", data={"db": "snp", "id": ids_str, "email": EMAIL}, timeout=30) r.raise_for_status() import xml.etree.ElementTree as ET root = ET.fromstring(r.text) webenv = root.findtext("WebEnv") query_key = root.findtext("QueryKey") return webenv, query_key def efetch_history(webenv: str, query_key: str, retstart: int = 0, retmax: int = 100) -> dict: """Retrieve records from NCBI history server using ESummary.""" r = requests.get(f"{BASE}/esummary.fcgi", params={"db": "snp", "WebEnv": webenv, "query_key": query_key, "retstart": retstart, "retmax": retmax, "retmode": "json", "email": EMAIL}, timeout=30) r.raise_for_status() return r.json()["result"] rsid_batch = ["rs80357906", "rs80357220", "rs28897672", "rs1801133", "rs429358", "rs7412", "rs1800497"] webenv, query_key = epost_ids(rsid_batch) print(f"Posted {len(rsid_batch)} IDs | WebEnv: {webenv[:40]}...") records = [] for start in range(0, len(rsid_batch), 100): result = efetch_history(webenv, query_key, retstart=start, retmax=100) for uid in result.get("uids", []): rec = result[uid] records.append({ "rsid": f"rs{uid}", "snp_class": rec.get("snp_class"), "maf": rec.get("maf"), "maf_allele": rec.get("mafallele"), "chrpos": rec.get("chrpos"), "clinical_sig": rec.get("clinical_significance"), }) time.sleep(0.5) df = pd.DataFrame(records) print(f"\nRetrieved {len(df)} records:") print(df.to_string(index=False)) ``` ### Query 6: NCBI Variation Services API The newer REST API returns structured JSON with detailed allele placements, frequencies, and variant type annotations. Preferred for programmatic rsID resolution. ```python import requests import pandas as pd BASE_VARIATION = "https://api.ncbi.nlm.nih.gov/variation/v0" def fetch_refsnp(rsid: str) -> dict: """Fetch structured JSON from NCBI Variation Services API.""" rs_num = str(rsid).lstrip("rs") r = requests.get(f"{BASE_VARIATION}/refsnp/{rs_num}", timeout=15) r.raise_for_status() return r.json() def parse_allele_frequencies(record: dict) -> list: """Extract allele frequencies from a Variation Services record.""" freqs = [] snapshot = record.get("primary_snapshot_data", {}) allele_annotations = snapshot.get("allele_annotations", []) for ann in allele_annotations: for freq in ann.get("frequency", []): freqs.append({ "allele": ann.get("allele"), "study": freq.get("study_name"), "allele_count": freq.get("allele_count"), "total_count": freq.get("total_count"), "observation": freq.get("observation", {}).get("allele_count"), }) return freqs record = fetch_refsnp("rs1800497") # DRD2 Taq1A variant print(f"rsID: rs{record['refsnp_id']}") print(f"Variant type: {record.get('primary_snapshot_data', {}).get('variant_type')}") placements = record.get("primary_snapshot_data", {}).get("placements_with_allele", []) for placement in placements[:2]: seq_id = placement.get("seq_id") is_top = placement.get("is_ptlp") for allele in placement.get("alleles", []): spdi = allele.get("allele", {}).get("spdi", {}) print(f" Placement: {seq_id} | SPDI: {spdi.get('inserted_sequence')}") freqs = parse_allele_frequencies(record) if freqs: df_freq = pd.DataFrame(freqs[:5]) print(f"\nAllele frequencies ({len(freqs)} entries):") print(df_freq.to_string(index=False)) ``` ## Key Concepts ### rsID vs. ss ID (Submitted SNP) dbSNP uses two IDs: **rs IDs** (Reference SNP cluster IDs) are stable public identifiers assigned after clustering submitted variants. **ss IDs** (Submitted SNP IDs) are assigned to individual laboratory submissions before clustering. Use rs IDs for all queries — ss IDs are internal and submission-specific. A single rs ID may cluster multiple ss IDs from different submissions. ### MAF vs. Clinical Significance - **MAF (Minor Allele Frequency)**: The aggregate frequency of the minor allele across all dbSNP submissions. This is a population-level statistic aggregated from 1000 Genomes, gnomAD, TOPMED, and other studies. It does not indicate whether the variant is pathogenic. - **Clinical significance**: A link field pointing to ClinVar classifications (Pathogenic, VUS, Benign, etc.). A variant can have a very low MAF (rare) yet be classified as Benign, or a moderate MAF yet be Pathogenic in a specific context (e.g., founder variants). Use `clinvar-database` for the full pathogenicity record. ### Variant Classes in dbSNP | Class | Description | Example | |-------|-------------|---------| | `snv` | Single nucleotide variant (A>T) | rs80357906 | | `indel` | Insertion or deletion | rs786201005 | | `mnv` | Multi-nucleotide variant | rs1057519737 | | `ins` | Pure insertion | rs113993960 | | `del` | Pure deletion | rs66767301 | | `microsatellite` | STR (short tandem repeat) | rs5030655 | ## Common Workflows ### Workflow 1: Batch rsID Annotation from a Variant List **Goal**: Given a list of rsIDs from a variant call pipeline, retrieve MAF, position, and clinical significance for all variants in one run. ```python import requests, time, pandas as pd EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def epost_snp(rsids): ids = ",".join(str(r).lstrip("rs") for r in rsids) r = requests.post(f"{BASE}/epost.fcgi", data={"db": "snp", "id": ids, "email": EMAIL}, timeout=30) r.raise_for_status() import xml.etree.ElementTree as ET root = ET.fromstring(r.text) return root.findtext("WebEnv"), root.findtext("QueryKey") def esummary_history(webenv, query_key, start, retmax=100): r = requests.get(f"{BASE}/esummary.fcgi", params={"db": "snp", "WebEnv": webenv, "query_key": query_key, "retstart": start, "retmax": retmax, "retmode": "json", "email": EMAIL}, timeout=30) r.raise_for_status() return r.json()["result"] # Example: VCF post-processing — annotate a list of called variants variant_rsids = [ "rs80357906", "rs80357220", "rs28897672", "rs1801133", "rs429358", "rs7412", "rs1800497", "rs2230199", ] print(f"Posting {len(variant_rsids)} rsIDs to NCBI history server...") webenv, query_key = epost_snp(variant_rsids) records = [] for start in range(0, len(variant_rsids), 100): result = esummary_history(webenv, query_key, start=start, retmax=100) for uid in result.get("uids", []): rec = result[uid] records.append({ "rsid": f"rs{uid}", "snp_class": rec.get("snp_class"), "maf": rec.get("maf"), "maf_allele": rec.get("mafallele"), "chrpos_grch38": rec.get("chrpos"), "gene": rec.get("genes", [{}])[0].get("name") if rec.get("genes") else None, "clinical_significance": rec.get("clinical_significance"), "fxn_class": rec.get("fxn_class"), }) time.sleep(0.5) df = pd.DataFrame(records) df.to_csv("variant_annotations.csv", index=False) print(f"Annotated {len(df)} variants → variant_annotations.csv") print(df[["rsid", "snp_class", "maf", "chrpos_grch38", "clinical_significance"]].to_string(index=False)) ``` ### Workflow 2: Gene Variant Class Distribution Visualization **Goal**: Search all dbSNP variants in a gene and plot their variant class distribution. ```python import requests, time import pandas as pd import matplotlib.pyplot as plt EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def search_gene_variants(gene: str, retmax: int = 500) -> list: r = requests.get(f"{BASE}/esearch.fcgi", params={"db": "snp", "term": f"{gene}[gene] AND human[orgn]", "retmax": retmax, "retmode": "json", "email": EMAIL}, timeout=20) r.raise_for_status() return r.json()["esearchresult"]["idlist"] def fetch_summaries_batch(ids: list, batch_size: int = 100) -> list: records = [] for i in range(0, len(ids), batch_size): batch = ids[i:i+batch_size] ids_str = ",".join(batch) r = requests.post(f"{BASE}/esummary.fcgi", data={"db": "snp", "id": ids_str, "retmode": "json", "email": EMAIL}, timeout=30) r.raise_for_status() result = r.json()["result"] for uid in result.get("uids", []): rec = result[uid] records.append({"rsid": f"rs{uid}", "snp_class": rec.get("snp_class", "unknown")}) time.sleep(0.5) return records gene = "CFTR" print(f"Searching dbSNP for {gene} variants...") ids = search_gene_variants(gene, retmax=300) print(f"Found {len(ids)} IDs; fetching summaries...") records = fetch_summaries_batch(ids) df = pd.DataFrame(records) class_counts = df["snp_class"].value_counts() # Plot variant class distribution fig, ax = plt.subplots(figsize=(8, 5)) colors = ["#4472C4", "#ED7D31", "#A9D18E", "#FF0000", "#FFC000", "#7030A0"] bars = ax.bar(class_counts.index, class_counts.values, color=colors[:len(class_counts)], edgecolor="white") ax.bar_label(bars, padding=3, fontsize=9) ax.set_xlabel("Variant Class") ax.set_ylabel("Count") ax.set_title(f"dbSNP Variant Classes in {gene} (n={len(df)})") plt.tight_layout() plt.savefig(f"{gene}_variant_classes.png", dpi=150, bbox_inches="tight") print(f"Saved {gene}_variant_classes.png") print(class_counts.to_string()) # snv 241 # indel 38 # mnv 12 # del 7 # ins 2 ``` ## Key Parameters | Parameter | Function/Endpoint | Default | Range / Options | Effect | |-----------|-------------------|---------|-----------------|--------| | `db` | All E-utilities | required | `"snp"` | Database selector; must be `"snp"` for dbSNP queries | | `id` | efetch, esummary, epost | required | rsID number(s) without `rs` prefix | Variant identifier(s) to fetch | | `term` | esearch | required | dbSNP query string | Search expression with field tags: `[gene]`, `[CHR]`, `[CHRPOS]`, `[rs]`, `clinsig[filter]` | | `retmax` | esearch | `20` | `1`–`10000` | Maximum records returned per search | | `retmode` | esearch, esummary | `"xml"` | `"json"`, `"xml"` | Response format; use `"json"` for easy parsing | | `rettype` | efetch | `"docsum"` | `"docsum"`, `"xml"` | Record type for efetch responses | | `WebEnv` + `query_key` | esummary, efetch | — | from epost response | History server tokens for batch retrieval | | `email` | All E-utilities | required | valid email string | NCBI policy; used for rate attribution | | `api_key` | All E-utilities | optional | NCBI API key string | Raises rate limit from 3 to 10 req/sec | ## Best Practices 1. **Register for a free NCBI API key**: Adds `api_key=YOUR_KEY` to requests and triples your rate limit (3 → 10 req/sec) with no other changes. Register at https://www.ncbi.nlm.nih.gov/account/. 2. **Use epost+esummary for batches of more than 10 rsIDs**: Avoid looping individual efetch calls. EPost uploads all IDs in one request to the history server; subsequent ESummary calls retrieve them in configurable batches of up to 500. 3. **Prefer NCBI Variation Services API for structured JSON**: The `/variation/v0/refsnp/{rs_num}` endpoint returns a fully structured JSON with SPDI allele representations, placements, and frequency tables. Easier to parse than E-utilities XML for modern applications. 4. **Check `snp_class` before interpreting MAF**: Indels and MNVs use different allele counting conventions than SNVs. Treat multi-allelic sites carefully — the reported MAF may refer to one allele among several. 5. **Combine dbSNP with ClinVar lookups**: dbSNP records the `clinical_significance` field as a string (e.g., `"pathogenic"`) but does not contain submitter details, review status, or HGVS details. Use `clinvar-database` for the full pathogenicity record when clinical interpretation is required. ## Common Recipes ### Recipe: Quick rsID Existence and Class Check When to use: Check whether a variant is registered in dbSNP before downstream annotation. ```python import requests EMAIL = "your@email.com" def check_rsid(rsid: str) -> dict: """Check if an rsID exists in dbSNP and return basic info.""" rs_num = str(rsid).lstrip("rs") r = requests.get( "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi", params={"db": "snp", "id": rs_num, "retmode": "json", "email": EMAIL}, timeout=10 ) result = r.json().get("result", {}) rec = result.get(rs_num, {}) if not rec or "error" in rec: return {"rsid": rsid, "found": False} return { "rsid": rsid, "found": True, "snp_class": rec.get("snp_class"), "chrpos": rec.get("chrpos"), "maf": rec.get("maf"), "clinical_significance": rec.get("clinical_significance"), } for rsid in ["rs80357906", "rs9999999999", "rs1800497"]: info = check_rsid(rsid) if info["found"]: print(f"{rsid}: {info['snp_class']} | pos={info['chrpos']} | MAF={info['maf']}") else: print(f"{rsid}: NOT FOUND in dbSNP") # rs80357906: snv | pos=17:43094692 | MAF=0.000008 # rs9999999999: NOT FOUND in dbSNP # rs1800497: snv | pos=11:113270828 | MAF=0.3517 ``` ### Recipe: Resolve Gene Variants to rsIDs and Coordinates When to use: Convert a gene name to a list of rsIDs for use in downstream tools (PLINK, ANNOVAR, etc.). ```python import requests, time, pandas as pd EMAIL = "your@email.com" BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" def gene_to_rsids(gene: str, max_variants: int = 200) -> pd.DataFrame: """Search dbSNP for a gene and return rsIDs with GRCh38 coordinates.""" # Step 1: search r = requests.get(f"{BASE}/esearch.fcgi", params={"db": "snp", "term": f"{gene}[gene] AND human[orgn]", "retmax": max_variants, "retmode": "json", "email": EMAIL}, timeout=15) r.raise_for_status() ids = r.json()["esearchresult"]["idlist"] if not ids: return pd.DataFrame() time.sleep(0.4) # Step 2: fetch summaries r2 = requests.post(f"{BASE}/esummary.fcgi", data={"db": "snp", "id": ",".join(ids), "retmode": "json", "email": EMAIL}, timeout=30) r2.raise_for_status() result = r2.json()["result"] rows = [] for uid in result.get("uids", []): rec = result[uid] rows.append({ "rsid": f"rs{uid}", "chrpos_grch38": rec.get("chrpos"), "snp_class": rec.get("snp_class"), "maf": rec.get("maf"), }) return pd.DataFrame(rows) df = gene_to_rsids("APOE", max_variants=50) print(f"APOE variants retrieved: {len(df)}") print(df.head(8).to_string(index=False)) df.to_csv("APOE_rsids.csv", index=False) ``` ## Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | `HTTP 429` or connection refused | Rate limit exceeded (3 req/sec) | Add `time.sleep(0.35)` between requests; register for API key to get 10 req/sec | | ESummary returns `{"error": "Invalid uid"}` | rsID does not exist in dbSNP | Check rsID spelling; verify with NCBI browser; variant may be a novel call not yet in dbSNP | | `esearch` returns 0 results for a gene | Gene symbol mismatch or missing `human[orgn]` filter | Try adding `AND human[orgn]`; check NCBI gene symbol at https://www.ncbi.nlm.nih.gov/gene | | MAF field is empty or `"."` | Variant has no population frequency data in dbSNP | Use gnomAD via `gnomad-database` for population frequencies; not all rsIDs have MAF | | Variation Services API returns 404 | rsID not found or wrong URL format | Confirm integer rs number (no `rs` prefix) in `/refsnp/{rs_num}` endpoint | | EPost XML parsing fails | Non-XML response (rate limit HTML error page) | Check response status code first; add retry logic with `time.sleep(1)` | | Batch efetch returns fewer records than posted | Some rsIDs were merged or retired | Cross-check against NCBI merge history; retired rsIDs redirect to current active rs | ## Related Skills - `clinvar-database` — ClinVar pathogenicity classifications for variants identified by rsID (complement to dbSNP) - `gnomad-database` — Population allele frequencies by ancestry group (more detailed than dbSNP MAF) - `gwas-database` — GWAS Catalog for SNP-trait associations from published GWAS studies - `ensembl-database` — Ensembl REST API for variant consequences and gene annotations - `snpeff-variant-annotation` — Annotate VCF files with SnpEff and SnpSift, which adds dbSNP rsIDs and functional predictions ## References - [NCBI E-utilities for dbSNP](https://www.ncbi.nlm.nih.gov/snp/docs/entrez_help/) — E-utilities field tags and query syntax specific to dbSNP - [NCBI Variation Services API](https://api.ncbi.nlm.nih.gov/variation/v0/) — REST API documentation and interactive Swagger UI - [Sherry et al., Nucleic Acids Research 2001](https://doi.org/10.1093/nar/29.1.308) — dbSNP original description paper - [NCBI E-utilities Reference](https://www.ncbi.nlm.nih.gov/books/NBK25499/) — Full E-utilities API reference (applies to all NCBI databases)