---
name: "gnomad-database"
description: "gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database."
license: "ODbL-1.0"
---

# gnomAD Database

## Overview

The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at `https://gnomad.broadinstitute.org/api` — no authentication required, no official SDK.

## When to Use

- Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
- Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
- Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
- Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
- Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
- Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
- For clinical pathogenicity classifications use `clinvar-database`; gnomAD provides frequency evidence but does not classify pathogenicity
- For GWAS associations at the study level use `gwas-database`; gnomAD is for population frequency lookups

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: gene symbols (e.g., `BRCA1`), variant IDs (`1-69511-A-G` format, or rsIDs)
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between requests for polite access; avoid bursts over 10 requests/second

```bash
pip install requests pandas matplotlib
```

## Quick Start

```python
import requests
import time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query: str, variables: dict = None) -> dict:
    """Execute a gnomAD GraphQL query and return the data payload."""
    payload = {"query": query, "variables": variables or {}}
    r = requests.post(GNOMAD_API, json=payload, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

# Quick check: get pLI for BRCA1
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gnomad_constraint { pLI lof { oe_ci { upper } } }
  }
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI: {constraint['pLI']:.3f}")
print(f"BRCA1 LOEUF: {constraint['lof']['oe_ci']['upper']:.3f}")
# BRCA1 pLI: 0.999
# BRCA1 LOEUF: 0.127
```

## Core API

### Query 1: Gene Variant Query

Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.

```python
import requests, time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    gene_name
    variants(dataset: $dataset) {
      variant_id
      rsids
      chrom
      pos
      ref
      alt
      consequence
      lof
      genome {
        an
        ac
        af
        faf95 { popmax popmax_population }
      }
    }
  }
}
"""

data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": "PCSK9",
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['gene_name']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
    print(f"  {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")
```

### Query 2: Variant Lookup

Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.

```python
VARIANT_QUERY = """
query VariantDetails($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    rsids
    chrom
    pos
    ref
    alt
    consequence
    lof
    lof_filter
    lof_flags
    genome {
      an
      ac
      af
      faf95 { popmax popmax_population }
      populations {
        id
        ac
        an
        af
      }
    }
  }
}
"""

data = gnomad_query(VARIANT_QUERY, {
    "variant_id": "1-55039974-G-T",   # PCSK9 p.Tyr142Ter (LoF)
    "dataset": "gnomad_r4"
})
v = data["variant"]
print(f"Variant: {v['variant_id']}")
print(f"rsIDs: {v['rsids']}")
print(f"Consequence: {v['consequence']}  |  LoF: {v['lof']}")
g = v["genome"]
print(f"Genome AF: {g['af']:.2e}  (AC={g['ac']}, AN={g['an']})")
print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")
```

### Query 3: Population Frequencies

Retrieve allele frequency broken down by ancestry group for a specific variant.

```python
import pandas as pd

POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    genome {
      populations {
        id
        ac
        an
        af
        homozygote_count
      }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "African/African American",
    "amr": "Admixed American",
    "eas": "East Asian",
    "fin": "Finnish",
    "nfe": "Non-Finnish European",
    "sas": "South Asian",
    "asj": "Ashkenazi Jewish",
    "mid": "Middle Eastern",
    "oth": "Other",
}

data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variant_id": "1-55039974-G-T",
    "dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]

# Filter to top-level ancestry groups (exclude sex-specific)
main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(main_pops)
df["label"] = df["id"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))
```

### Query 4: Coverage Query

Retrieve per-base read depth coverage for a gene region to assess data completeness.

```python
COVERAGE_QUERY = """
query Coverage($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!) {
  coverage(dataset: $dataset, chrom: $chrom, start: $start, stop: $stop) {
    pos
    mean
    median
    over_1
    over_10
    over_20
    over_30
    over_100
  }
}
"""

data = gnomad_query(COVERAGE_QUERY, {
    "chrom": "1",
    "start": 55039700,
    "stop": 55040200,
    "dataset": "gnomad_r4"
})
cov = data["coverage"]
print(f"Coverage positions retrieved: {len(cov)}")
if cov:
    avg_mean = sum(c["mean"] for c in cov) / len(cov)
    pct_20x = sum(1 for c in cov if c["over_20"] > 0.9) / len(cov) * 100
    print(f"Average mean depth: {avg_mean:.1f}x")
    print(f"Positions with >90% samples at >=20x: {pct_20x:.1f}%")
    # Example single position
    c = cov[0]
    print(f"\nPosition {c['pos']}: mean={c['mean']:.1f}x, median={c['median']}x")
    print(f"  Fraction >=10x: {c['over_10']:.3f}, >=20x: {c['over_20']:.3f}, >=30x: {c['over_30']:.3f}")
```

### Query 5: Gene Constraint

Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.

```python
CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    gene_name
    gnomad_constraint {
      pLI
      pNull
      pRec
      mis_z
      lof {
        obs
        exp
        oe
        oe_ci { lower upper }
      }
    }
  }
}
"""

genes = ["PCSK9", "BRCA1", "TP53", "TTN"]
print(f"{'Gene':<10} {'pLI':>6} {'LOEUF':>7} {'mis_z':>7}")
print("-" * 35)
for gene in genes:
    data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
    c = data["gene"]["gnomad_constraint"]
    loeuf = c["lof"]["oe_ci"]["upper"]
    print(f"{gene:<10} {c['pLI']:>6.3f} {loeuf:>7.3f} {c['mis_z']:>7.2f}")
    time.sleep(0.5)
# Gene        pLI   LOEUF   mis_z
# PCSK9     0.855   0.543    2.11
# BRCA1     0.999   0.127    3.84
# TP53      0.993   0.191    5.21
# TTN       0.001   0.993   -2.40
```

### Query 6: Variant Search by Region

Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.

```python
REGION_VARIANTS_QUERY = """
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!,
                     $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
  region(chrom: $chrom, start: $start, stop: $stop,
         reference_genome: $reference_genome) {
    variants(dataset: $dataset) {
      variant_id
      rsids
      pos
      consequence
      lof
      genome {
        af
        ac
        an
        faf95 { popmax }
      }
    }
  }
}
"""

data = gnomad_query(REGION_VARIANTS_QUERY, {
    "chrom": "1",
    "start": 55039974,
    "stop": 55064852,   # PCSK9 coding region
    "dataset": "gnomad_r4",
    "reference_genome": "GRCh38"
})
variants = data["region"]["variants"]
print(f"Variants in region: {len(variants)}")

# Summarize by consequence
from collections import Counter
conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"])
for c, n in conseq_counts.most_common(5):
    print(f"  {c}: {n}")

# Loss-of-function variants
lof_vars = [v for v in variants if v["lof"] == "HC"]
print(f"\nHigh-confidence LoF variants: {len(lof_vars)}")
for v in lof_vars[:3]:
    af = v["genome"]["af"] if v["genome"] else None
    print(f"  {v['variant_id']} | AF={af:.2e}" if af else f"  {v['variant_id']} | AF=NA")
```

## Key Concepts

### gnomAD Data Model

gnomAD v4 has two datasets: `gnomad_r4` (exomes + genomes, GRCh38, 730K+ individuals) and `gnomad_r2_1` (GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through `gene()`, `region()`, or direct `variant()` lookups. Each variant has separate `exome` and `genome` frequency objects; the `genome` object is preferred for population frequency comparisons.

### Ancestry Groups

gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):

| Code | Population | Dataset size (approx) |
|------|-----------|----------------------|
| `afr` | African/African American | 76,000+ |
| `amr` | Admixed American | 45,000+ |
| `eas` | East Asian | 50,000+ |
| `fin` | Finnish | 24,000+ |
| `nfe` | Non-Finnish European | 400,000+ |
| `sas` | South Asian | 80,000+ |
| `asj` | Ashkenazi Jewish | 10,000+ |
| `mid` | Middle Eastern | 5,000+ |
| `oth` | Other/Unknown | varies |

### Filtering Allele Frequency (FAF95)

The `faf95` field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with `faf95.popmax < 0.001` is likely rare enough to warrant clinical investigation.

### Constraint Scores

| Score | Interpretation |
|-------|----------------|
| `pLI > 0.9` | Gene is intolerant to LoF — likely essential |
| `LOEUF < 0.35` | Strong LoF constraint (upper CI of oe ratio) |
| `mis_z > 3.09` | Gene shows significant missense constraint |
| `pLI < 0.1` | Gene tolerates LoF — homozygous LoF variants exist |

## Common Workflows

### Workflow 1: Rare Variant Frequency Report for a Gene

**Goal**: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.

```python
import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id gene_name
    variants(dataset: $dataset) {
      variant_id rsids chrom pos ref alt consequence lof lof_filter
      genome {
        an ac af
        faf95 { popmax popmax_population }
        populations { id ac an af }
      }
    }
  }
}
"""

gene = "LDLR"
data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": gene,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})

variants = data["gene"]["variants"]
rows = []
for v in variants:
    g = v.get("genome") or {}
    af = g.get("af")
    if af is None or af >= 0.01:   # keep only rare variants
        continue
    rows.append({
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v.get("consequence"),
        "lof": v.get("lof"),
        "af_genome": af,
        "ac": g.get("ac"),
        "an": g.get("an"),
        "faf95_popmax": g.get("faf95", {}).get("popmax"),
        "faf95_pop": g.get("faf95", {}).get("popmax_population"),
    })

df = pd.DataFrame(rows)
df = df.sort_values("af_genome")
df.to_csv(f"{gene}_rare_variants.csv", index=False)
print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)")
print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6))
# LDLR: 2847 total variants, 2631 rare (AF<1%)
# consequence
# missense_variant              1423
# synonymous_variant             512
# splice_region_variant          231
# stop_gained                    198
```

### Workflow 2: Ancestry-Stratified Frequency Visualization

**Goal**: Query a list of variants and produce a barplot of allele frequencies by ancestry group.

```python
import requests, time
import pandas as pd
import matplotlib.pyplot as plt

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    genome {
      populations { id ac an af }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN",
    "nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID",
}

variant_id = "1-55039974-G-T"   # PCSK9 p.Tyr142Ter
data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variant_id": variant_id,
    "dataset": "gnomad_r4"
})

pops = data["variant"]["genome"]["populations"]
rows = [{"code": p["id"], "af": p["af"], "ac": p["ac"], "an": p["an"]}
        for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(rows)
df["label"] = df["code"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white")
ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2)
ax.set_xlabel("Ancestry Group")
ax.set_ylabel("Allele Frequency (%)")
ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}")
ax.set_ylim(0, df["af"].max() * 150)
plt.tight_layout()
plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight")
print(f"Saved gnomad_pop_frequencies.png  (n={len(df)} ancestry groups)")
print(df[["label", "af", "ac", "an"]].to_string(index=False))
```

### Workflow 3: Constraint-Guided Gene Prioritization

**Goal**: Score a gene list by constraint metrics and flag LoF-intolerant genes.

```python
import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id gene_name
    gnomad_constraint {
      pLI pNull pRec mis_z
      lof { obs exp oe oe_ci { lower upper } }
    }
  }
}
"""

gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"]
records = []
for gene in gene_list:
    try:
        data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
        c = data["gene"]["gnomad_constraint"]
        records.append({
            "gene": gene,
            "pLI": c["pLI"],
            "LOEUF": c["lof"]["oe_ci"]["upper"],
            "mis_z": c["mis_z"],
            "lof_obs": c["lof"]["obs"],
            "lof_exp": c["lof"]["exp"],
            "lof_oe": c["lof"]["oe"],
        })
    except Exception as e:
        print(f"Warning: {gene} failed — {e}")
    time.sleep(0.5)

df = pd.DataFrame(records).sort_values("LOEUF")
df["lof_intolerant"] = df["pLI"] > 0.9
print(df[["gene", "pLI", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False))
df.to_csv("constraint_scores.csv", index=False)
print(f"\nLoF-intolerant genes: {df['lof_intolerant'].sum()}/{len(df)}")
```

## Key Parameters

| Parameter | Function/Endpoint | Default | Range / Options | Effect |
|-----------|-------------------|---------|-----------------|--------|
| `dataset` | All variant queries | — | `gnomad_r4`, `gnomad_r2_1`, `gnomad_r3` | Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1) |
| `reference_genome` | gene(), region() | — | `GRCh38`, `GRCh37` | Coordinate system; must match dataset |
| `variant_id` | variant() | — | `CHROM-POS-REF-ALT` string | Identifies the specific variant to query |
| `gene_symbol` | gene() | — | HGNC symbol string | Gene to retrieve; case-insensitive |
| `chrom`, `start`, `stop` | region() | — | valid genomic coordinates | Region boundaries for region queries |
| `faf95.popmax` | variant() genome | — | float 0–1 | Filtering allele frequency (95% CI upper bound); use < 0.001 for rare |
| `lof` filter field | gene() variants | — | `"HC"` (high-confidence), `"LC"` | LoF confidence level |
| `populations.id` | genome.populations | — | `afr`, `amr`, `eas`, `fin`, `nfe`, `sas`, `asj`, `mid`, `oth` | Per-ancestry frequency |

## Best Practices

1. **Use `gnomad_r4` for GRCh38 analyses**: gnomAD v4 is the most current dataset with 730K+ individuals. Use `gnomad_r2_1` only when comparing to GRCh37-based variant calls.

2. **Use `faf95.popmax` for clinical filtering, not overall AF**: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.

3. **Add `time.sleep(0.5)` in batch loops**: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.

4. **Filter `lof == "HC"` for LoF burden analyses**: Low-confidence LoF (`"LC"`) annotations are often in repetitive regions or may be sequencing artifacts. High-confidence (`"HC"`) calls are filtered by LOFTEE.

5. **Check AN before interpreting AF**: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.

## Common Recipes

### Recipe: Check if a Variant Is Common in Any Population

When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.

```python
import requests

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"):
    query = """
    query($variant_id: String!, $dataset: DatasetId!) {
      variant(variant_id: $variant_id, dataset: $dataset) {
        genome { faf95 { popmax popmax_population } af }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                                         "variables": {"variant_id": variant_id, "dataset": dataset}},
                      timeout=15)
    data = r.json()["data"]["variant"]
    if not data or not data["genome"]:
        return None, "Variant not found in gnomAD"
    af = data["genome"]["af"]
    popmax = data["genome"]["faf95"]["popmax"]
    pop = data["genome"]["faf95"]["popmax_population"]
    is_common = (popmax or 0) >= threshold
    return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}"

common, info = is_common_in_any_population("1-55039974-G-T")
print(f"Common: {common}  |  {info}")
# Common: False  |  overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe
```

### Recipe: Batch Constraint Lookup

When to use: Score multiple genes from a differential expression or GWAS gene list.

```python
import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_constraint(gene_symbol, reference_genome="GRCh38"):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gnomad_constraint { pLI mis_z lof { oe_ci { upper } } }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}},
                      timeout=15)
    data = r.json().get("data", {}).get("gene", {})
    if not data or not data.get("gnomad_constraint"):
        return None
    c = data["gnomad_constraint"]
    return {"gene": gene_symbol, "pLI": c["pLI"], "LOEUF": c["lof"]["oe_ci"]["upper"], "mis_z": c["mis_z"]}

genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"]
rows = [r for g in genes for r in [get_constraint(g)] if r]
time.sleep(0.5)   # polite delay per gene in real loop

df = pd.DataFrame(rows)
print(df.to_string(index=False))
# gene     pLI   LOEUF  mis_z
# BRCA1  0.999   0.127   3.84
# BRCA2  1.000   0.176   3.21
```

### Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.

```python
import requests, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        variants(dataset: $dataset) {
          variant_id rsids chrom pos ref alt consequence lof
          genome { af ac an }
        }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}},
                      timeout=60)
    variants = r.json()["data"]["gene"]["variants"]
    lof = [v for v in variants
           if v.get("lof") == "HC"
           and v.get("genome") and v["genome"].get("af") is not None
           and v["genome"]["af"] < max_af]
    return pd.DataFrame([{
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v["consequence"],
        "af": v["genome"]["af"],
        "ac": v["genome"]["ac"],
    } for v in lof])

df = get_lof_variants("CFTR", max_af=0.001)
print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("CFTR_HC_lof_variants.csv", index=False)
```

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `{"errors": [...]}` from GraphQL | Invalid field name, wrong dataset ID, or null gene | Check field names match gnomAD v4 schema; use `gnomad_r4` not `gnomad_v4` |
| Variant returns `None` genome object | Variant only in exome data, not genome | Try accessing `exome` field instead of `genome`; genome is absent for exome-only variants |
| Gene query returns empty variants list | Gene symbol not found or mismatch | Verify HGNC symbol (case-sensitive); use `gene_id` (ENSG ID) as fallback |
| `faf95` returns `null` | Variant is absent or monomorphic in all populations | Check `ac` and `an` — variant may have AC=0 or be filtered |
| `requests.exceptions.Timeout` | Large gene (e.g., TTN) takes >30s | Increase `timeout=120`; for very large genes use region queries instead |
| Population AF is `None` for some groups | Variant not observed in that ancestry | Treat `None` AF as 0 for filtering; check `an` to confirm the group was sequenced |
| `reference_genome` mismatch error | Using GRCh37 coords with `gnomad_r4` | Use `GRCh38` for `gnomad_r4`/`gnomad_r3`; use `GRCh37` only for `gnomad_r2_1` |

## Related Skills

- `clinvar-database` — ClinVar pathogenicity classifications (complement to gnomAD population frequency data)
- `gwas-database` — GWAS Catalog for SNP-trait associations from published GWAS studies
- `ensembl-database` — Ensembl VEP for variant consequence prediction and gene annotation
- `dbsnp-database` — dbSNP for rsID lookup, variant classes, and cross-database ID mapping

## References

- [gnomAD GraphQL API](https://gnomad.broadinstitute.org/api) — Interactive GraphQL explorer and endpoint documentation
- [Karczewski et al., Nature 2020](https://doi.org/10.1038/s41586-020-2308-7) — gnomAD v2.1 flagship paper (constraint metrics, LoF analysis)
- [gnomAD Help & FAQ](https://gnomad.broadinstitute.org/help) — Data model, ancestry definitions, FAF95 explanation
- [gnomAD v4 blog post](https://gnomad.broadinstitute.org/news/2023-11-gnomad-v4-0/) — gnomAD v4 release notes and dataset composition