--- name: tooluniverse-gwas-study-explorer description: Compare GWAS studies, perform meta-analyses, and assess replication across cohorts. Integrates NHGRI-EBI GWAS Catalog and Open Targets Genetics to compare study designs, effect sizes, ancestry diversity, and heterogeneity statistics. Use when comparing GWAS studies for a trait, performing meta-analysis of genetic loci, assessing replication across cohorts, or exploring the genetic architecture of complex diseases. --- # GWAS Study Deep Dive & Meta-Analysis **Compare GWAS studies, perform meta-analyses, and assess replication across cohorts** --- ## Overview The GWAS Study Deep Dive & Meta-Analysis skill enables comprehensive comparison of genome-wide association studies (GWAS) for the same trait, meta-analysis of genetic loci across studies, and systematic assessment of replication and study quality. It integrates data from the NHGRI-EBI GWAS Catalog and Open Targets Genetics to provide a complete picture of the genetic architecture of complex traits. ### Key Capabilities 1. **Study Comparison**: Compare all GWAS studies for a trait, assessing sample sizes, ancestries, and platforms 2. **Meta-Analysis**: Aggregate effect sizes across studies and calculate heterogeneity statistics 3. **Replication Assessment**: Identify replicated vs novel findings across discovery and replication cohorts 4. **Quality Evaluation**: Assess statistical power, ancestry diversity, and data availability --- ## Use Cases ### 1. Comprehensive Trait Analysis **Scenario**: "I want to understand all available GWAS data for type 2 diabetes" **Workflow**: - Search for all T2D studies in GWAS Catalog - Filter by sample size and ancestry - Extract top associations from each study - Identify consistently replicated loci - Assess ancestry-specific effects **Outcome**: Complete landscape of T2D genetics with replicated findings and population-specific signals ### 2. Locus-Specific Meta-Analysis **Scenario**: "Is the TCF7L2 association with T2D consistent across all studies?" **Workflow**: - Retrieve all TCF7L2 (rs7903146) associations for T2D - Calculate combined effect size and p-value - Assess heterogeneity (I² statistic) - Generate forest plot data - Interpret heterogeneity level **Outcome**: Quantitative assessment of effect size consistency with heterogeneity interpretation ### 3. Replication Analysis **Scenario**: "Which findings from the discovery cohort replicated in the independent sample?" **Workflow**: - Get top hits from discovery study - Check for presence and significance in replication study - Assess direction consistency - Calculate replication rate - Identify novel vs failed replication **Outcome**: Systematic replication report with success rates and failed findings ### 4. Multi-Ancestry Comparison **Scenario**: "Are T2D loci consistent across European and East Asian populations?" **Workflow**: - Filter studies by ancestry - Compare top associations between populations - Identify shared vs population-specific loci - Assess allele frequency differences - Evaluate transferability of genetic risk scores **Outcome**: Ancestry-specific genetic architecture with transferability assessment --- ## Statistical Methods ### Meta-Analysis Approach This skill implements standard GWAS meta-analysis methods: **Fixed-Effects Model**: - Used when heterogeneity is low (I² < 25%) - Weights studies by inverse variance - Assumes true effect size is the same across studies **Random-Effects Model** (recommended when I² > 50%): - Accounts for between-study variation - More conservative than fixed-effects - Better for diverse ancestries or methodologies **Heterogeneity Assessment**: The **I² statistic** measures the percentage of variance due to between-study heterogeneity: ``` I² = [(Q - df) / Q] × 100% where Q = Cochran's Q statistic df = degrees of freedom (n_studies - 1) ``` **Interpretation Guidelines**: - **I² < 25%**: Low heterogeneity → fixed-effects appropriate - **I² = 25-50%**: Moderate heterogeneity → investigate sources - **I² = 50-75%**: Substantial heterogeneity → random-effects preferred - **I² > 75%**: Considerable heterogeneity → meta-analysis may not be appropriate ### Sources of Heterogeneity Common reasons for high I²: 1. **Ancestry differences**: Different allele frequencies and LD structure 2. **Phenotype heterogeneity**: Trait definition varies across studies 3. **Platform differences**: Imputation quality and coverage 4. **Winner's curse**: Discovery studies overestimate effect sizes 5. **Cohort characteristics**: Age, sex, environmental factors **Recommendations**: - Perform subgroup analysis by ancestry - Use meta-regression to investigate sources - Consider excluding outlier studies - Apply genomic control correction --- ## Study Quality Assessment ### Quality Metrics The skill evaluates studies based on: **1. Sample Size**: - Power to detect associations (80% power requires n > 10,000 for OR=1.2) - Precision of effect size estimates - Ability to detect modest effects **2. Ancestry Diversity**: - Single-ancestry vs multi-ancestry - Population stratification control - Transferability of findings **3. Data Availability**: - Summary statistics available for meta-analysis - Individual-level data vs summary-level - Imputation quality scores **4. Genotyping Quality**: - Platform density and coverage - Imputation reference panel - Quality control measures **5. Statistical Rigor**: - Genome-wide significance threshold (p < 5×10⁻⁸) - Multiple testing correction - Replication in independent cohort ### Quality Tiers **Tier 1 (High Quality)**: - n ≥ 50,000 - Summary statistics available - Multi-ancestry or large single-ancestry - Imputed to high-quality reference - Independent replication **Tier 2 (Moderate Quality)**: - n ≥ 10,000 - Standard GWAS platform - Adequate power for common variants - Some data availability **Tier 3 (Limited)**: - n < 10,000 - Limited power - May miss modest effects - Use with caution --- ## Best Practices ### Before Meta-Analysis 1. **Check phenotype consistency**: Ensure studies measure the same trait 2. **Verify ancestry overlap**: High heterogeneity expected if ancestries differ 3. **Harmonize alleles**: Align effect alleles across studies 4. **Quality control**: Exclude low-quality studies or associations ### Interpreting Results 1. **Genome-wide significance**: p < 5×10⁻⁸ (Bonferroni for ~1M independent tests) 2. **Replication threshold**: p < 0.05 in independent cohort 3. **Direction consistency**: Effect should be same direction across studies 4. **Heterogeneity**: I² > 50% suggests caution in interpretation ### Common Pitfalls ❌ **Don't**: - Meta-analyze without checking heterogeneity - Ignore ancestry differences - Over-interpret nominal p-values - Assume replication failure means false positive ✅ **Do**: - Always report I² statistic - Perform sensitivity analyses - Consider ancestry-stratified analysis - Account for winner's curse in discovery studies --- ## Limitations & Caveats ### Data Limitations 1. **Incomplete Overlap**: Studies may analyze different SNPs 2. **Cohort Overlap**: Some cohorts participate in multiple studies (inflates significance) 3. **Publication Bias**: Significant findings more likely to be published 4. **Winner's Curse**: Discovery studies overestimate effect sizes 5. **Imputation Quality**: Varies across studies and populations ### Statistical Limitations 1. **Heterogeneity**: High I² may preclude meaningful meta-analysis 2. **Sample Size Differences**: Large studies dominate fixed-effects models 3. **Allele Frequency Differences**: Same variant has different effects across ancestries 4. **Linkage Disequilibrium**: Fine-mapping needed to identify causal variants 5. **Gene-Environment Interactions**: Not captured in standard meta-analysis ### Interpretation Guidelines **When I² > 75%**: - Meta-analysis results should be interpreted with extreme caution - Investigate sources of heterogeneity systematically - Consider ancestry-specific or subgroup analyses - Descriptive comparison may be more appropriate than meta-analysis **When Studies Conflict**: - Check for methodological differences - Verify phenotype definitions match - Investigate population stratification - Consider conditional analysis --- ## Scientific References ### Key Publications 1. **GWAS Best Practices**: - Visscher et al. (2017). "10 Years of GWAS Discovery" *American Journal of Human Genetics* 101(1): 5-22 - PMID: 28686856 - DOI: 10.1016/j.ajhg.2017.06.005 2. **Meta-Analysis Methods**: - Evangelou & Ioannidis (2013). "Meta-analysis methods for genome-wide association studies and beyond" *Nature Reviews Genetics* 14: 379-389 - PMID: 23657481 3. **Heterogeneity Interpretation**: - Higgins et al. (2003). "Measuring inconsistency in meta-analyses" *BMJ* 327: 557-560 - PMID: 12958120 4. **Multi-Ancestry GWAS**: - Peterson et al. (2019). "Genome-wide Association Studies in Ancestrally Diverse Populations" *Nature Reviews Genetics* 20: 409-422 - PMID: 30926972 5. **Replication Standards**: - Chanock et al. (2007). "Replicating genotype-phenotype associations" *Nature* 447: 655-660 - PMID: 17554299 --- ## Tools Used ### GWAS Catalog API - `gwas_search_studies`: Find studies by trait - `gwas_get_study_by_id`: Get detailed study metadata - `gwas_get_associations_for_study`: Retrieve study associations - `gwas_get_associations_for_snp`: Get SNP associations across studies - `gwas_search_associations`: Search associations by trait ### Open Targets Genetics GraphQL API - `OpenTargets_search_gwas_studies_by_disease`: Disease-based study search - `OpenTargets_get_gwas_study`: Detailed study information with LD populations - `OpenTargets_get_variant_credible_sets`: Fine-mapped loci for variant - `OpenTargets_get_study_credible_sets`: All credible sets for study - `OpenTargets_get_variant_info`: Variant annotation and allele frequencies --- ## Glossary **Association**: Statistical relationship between a genetic variant and a trait **Credible Set**: Set of variants likely to contain the causal variant (from fine-mapping) **Effect Size**: Magnitude of genetic association (beta coefficient or odds ratio) **Fine-Mapping**: Statistical method to identify causal variants within a locus **Genome-Wide Significance**: p < 5×10⁻⁸, accounting for ~1M independent tests **Heterogeneity (I²)**: Percentage of variance due to between-study differences **L2G (Locus-to-Gene)**: Score predicting which gene is affected by a GWAS locus **LD (Linkage Disequilibrium)**: Non-random association of alleles at different loci **Meta-Analysis**: Statistical combination of results from multiple studies **Replication**: Independent confirmation of an association in a new cohort **Summary Statistics**: Per-SNP statistics (p-value, beta, SE) from GWAS **Winner's Curse**: Overestimation of effect size in discovery studies --- ## Next Steps After running this skill, consider: 1. **Fine-Mapping**: Use credible sets from Open Targets to identify causal variants 2. **Functional Follow-Up**: Investigate biological mechanisms of replicated loci 3. **Genetic Risk Scores**: Calculate polygenic risk scores using validated loci 4. **Drug Target Identification**: Use L2G scores to prioritize therapeutic targets 5. **Cross-Trait Analysis**: Look for pleiotropy with related traits --- ## Version History - **v1.0** (2026-02-13): Initial release with study comparison, meta-analysis, and replication assessment --- **Created by**: ToolUniverse GWAS Analysis Team **Last Updated**: 2026-02-13 **License**: Open source (MIT)