--- name: tooluniverse-gwas-finemapping description: Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes. --- # GWAS Fine-Mapping & Causal Variant Prioritization Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. ## Overview Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. **Fine-mapping** uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics. This skill provides tools to: - **Prioritize causal variants** using fine-mapping posterior probabilities - **Link variants to genes** using locus-to-gene (L2G) predictions - **Annotate variants** with functional consequences - **Suggest validation strategies** based on fine-mapping results ## Key Concepts ### Credible Sets A **credible set** is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a **posterior probability** of being causal, computed using methods like: - **SuSiE** (Sum of Single Effects) - **FINEMAP** (Bayesian fine-mapping) - **PAINTOR** (Probabilistic Annotation INtegraTOR) ### Posterior Probability The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal. ### Locus-to-Gene (L2G) Predictions L2G scores integrate multiple data types to predict which gene is affected by a variant: - Distance to gene (closer = higher score) - eQTL evidence (expression changes) - Chromatin interactions (Hi-C, promoter capture) - Functional annotations (coding variants, regulatory regions) L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links. ## Use Cases ### 1. Prioritize Variants at a Known Locus **Question**: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?" ```python from python_implementation import prioritize_causal_variants # Prioritize variants in TCF7L2 for diabetes result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") print(result.get_summary()) # Output shows: # - Credible sets containing TCF7L2 variants # - Posterior probabilities (via fine-mapping methods) # - Top L2G genes (which genes are likely affected) # - Associated traits ``` ### 2. Fine-Map a Specific Variant **Question**: "What do we know about rs429358 (APOE4) from fine-mapping?" ```python # Fine-map a specific variant result = prioritize_causal_variants("rs429358") # Check which credible sets contain this variant for cs in result.credible_sets: print(f"Trait: {cs.trait}") print(f"Fine-mapping method: {cs.finemapping_method}") print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}") print(f"Confidence: {cs.confidence}") ``` ### 3. Explore All Loci from a GWAS Study **Question**: "What are all the causal loci from the recent T2D meta-analysis?" ```python from python_implementation import get_credible_sets_for_study # Get all fine-mapped loci from a study credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS print(f"Found {len(credible_sets)} independent loci") # Examine each locus for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}") if cs.l2g_genes: top_gene = cs.l2g_genes[0] print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})") ``` ### 4. Find GWAS Studies for a Disease **Question**: "What GWAS studies exist for Alzheimer's disease?" ```python from python_implementation import search_gwas_studies_for_disease # Search by disease name studies = search_gwas_studies_for_disease("Alzheimer's disease") for study in studies[:5]: print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples") print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}") print(f" Has summary stats: {study.get('hasSumstats', False)}") # Or use precise disease ontology IDs studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's ) ``` ### 5. Get Validation Suggestions **Question**: "How should we validate the top causal variant?" ```python result = prioritize_causal_variants("APOE", "alzheimer") # Get experimental validation suggestions suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion) # Output includes: # - CRISPR knock-in experiments # - Reporter assays # - eQTL analysis # - Colocalization studies ``` ## Workflow Example: Complete Fine-Mapping Analysis ```python from python_implementation import ( prioritize_causal_variants, search_gwas_studies_for_disease, get_credible_sets_for_study ) # Step 1: Find relevant GWAS studies print("Step 1: Finding T2D GWAS studies...") studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148") largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0) print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)") # Step 2: Get all fine-mapped loci from the study print("\nStep 2: Getting fine-mapped loci...") credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100) print(f"Found {len(credible_sets)} credible sets") # Step 3: Find loci near genes of interest print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ] print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci") # Step 4: Prioritize variants at TCF7L2 print("\nStep 4: Prioritizing TCF7L2 variants...") result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") # Step 5: Print summary and validation plan print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary()) print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion) ``` ## Data Classes ### `FineMappingResult` Main result object containing: - `query_variant`: Variant annotation - `query_gene`: Gene symbol (if queried by gene) - `credible_sets`: List of fine-mapped loci - `associated_traits`: All associated traits - `top_causal_genes`: L2G genes ranked by score Methods: - `get_summary()`: Human-readable summary - `get_validation_suggestions()`: Experimental validation strategies ### `CredibleSet` Represents a fine-mapped locus: - `study_locus_id`: Unique identifier - `region`: Genomic region (e.g., "10:112861809-113404438") - `lead_variant`: Top variant by posterior probability - `finemapping_method`: Statistical method used (SuSiE, FINEMAP, etc.) - `l2g_genes`: Locus-to-gene predictions - `confidence`: Credible set confidence (95%, 99%) ### `L2GGene` Locus-to-gene prediction: - `gene_symbol`: Gene name (e.g., "TCF7L2") - `gene_id`: Ensembl gene ID - `l2g_score`: Probability score (0-1) ### `VariantAnnotation` Functional annotation for a variant: - `variant_id`: Open Targets format (chr_pos_ref_alt) - `rs_ids`: dbSNP identifiers - `chromosome`, `position`: Genomic coordinates - `most_severe_consequence`: Functional impact - `allele_frequencies`: Population-specific MAFs ## Tools Used ### Open Targets Genetics (GraphQL) - `OpenTargets_get_variant_info`: Variant details and allele frequencies - `OpenTargets_get_variant_credible_sets`: Credible sets containing a variant - `OpenTargets_get_credible_set_detail`: Detailed credible set information - `OpenTargets_get_study_credible_sets`: All loci from a GWAS study - `OpenTargets_search_gwas_studies_by_disease`: Find studies by disease ### GWAS Catalog (REST API) - `gwas_search_snps`: Find SNPs by gene or rsID - `gwas_get_snp_by_id`: Detailed SNP information - `gwas_get_associations_for_snp`: All trait associations for a variant - `gwas_search_studies`: Find studies by disease/trait ## Understanding Fine-Mapping Output ### Interpreting Posterior Probabilities - **> 0.5**: Very likely causal (strong candidate) - **0.1 - 0.5**: Plausible causal variant - **0.01 - 0.1**: Possible but uncertain - **< 0.01**: Unlikely to be causal ### Interpreting L2G Scores - **> 0.7**: High confidence gene-variant link - **0.5 - 0.7**: Moderate confidence - **0.3 - 0.5**: Weak but possible link - **< 0.3**: Low confidence ### Fine-Mapping Methods Compared | Method | Approach | Strengths | Use Case | |--------|----------|-----------|----------| | **SuSiE** | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci | | **FINEMAP** | Bayesian shotgun stochastic search | Fast, scalable | Large studies | | **PAINTOR** | Functional annotations | Integrates epigenomics | Regulatory variants | | **CAVIAR** | Colocalization | Finds shared causal variants | eQTL overlap | ## Common Questions **Q: Why don't all variants have credible sets?** A: Fine-mapping requires: 1. GWAS summary statistics (not just top hits) 2. LD reference panel 3. Sufficient signal strength (p < 5e-8) 4. Computational resources **Q: Can a variant be in multiple credible sets?** A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait. **Q: What if the top L2G gene is far from the variant?** A: This suggests regulatory effects (enhancers, promoters). Check: - eQTL evidence in relevant tissues - Chromatin interaction data (Hi-C) - Regulatory element annotations (Roadmap, ENCODE) **Q: How do I choose between variants in a credible set?** A: Prioritize by: 1. Posterior probability (higher = better) 2. Functional consequence (coding > regulatory > intergenic) 3. eQTL evidence 4. Evolutionary conservation 5. Experimental feasibility ## Limitations 1. **LD-dependent**: Fine-mapping accuracy depends on LD structure matching the study population 2. **Requires summary stats**: Not all studies provide full summary statistics 3. **Computational intensive**: Fine-mapping large studies takes significant resources 4. **Prior assumptions**: Bayesian methods depend on priors (number of causal variants, effect sizes) 5. **Missing data**: Not all GWAS loci have been fine-mapped in Open Targets ## Best Practices 1. **Start with study-level queries** when exploring a new disease 2. **Check multiple studies** for replication of signals 3. **Combine with functional data** (eQTLs, chromatin, CRISPR screens) 4. **Consider ancestry** - LD differs across populations 5. **Validate experimentally** - fine-mapping provides candidates, not proof ## References 1. Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." *JRSS-B* (SuSiE) 2. Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." *Bioinformatics* 3. Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." *NAR* 4. Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." *Nat Genet* ## Related Skills - **tooluniverse-gwas-explorer**: Broader GWAS analysis - **tooluniverse-eqtl-colocalization**: Link variants to gene expression - **tooluniverse-gene-prioritization**: Systematic gene ranking