{
"cells": [
{
"cell_type": "markdown",
"id": "6c3fa849",
"metadata": {},
"source": [
"# Drug responses - Background traits, PharmGKB"
]
},
{
"cell_type": "markdown",
"id": "ed4d72db",
"metadata": {},
"source": [
"## Table of contents\n",
"\n",
"1. [ClinVar](#Data-from-ClinVar)\n",
" 1. [Thoughts](#Thoughts)\n",
"2. [PharmGKB](#PharmGKB-data)\n",
" 1. [Clinical annotations](#Clinical-annotations)\n",
" 2. [Example extraction](#Example-extraction)\n",
" 2. [Connecting with ClinVar](#Connecting-with-ClinVar)\n",
" 3. [Star alleles](#Star-alleles)\n",
" 3. [Notes](#Notes)\n",
"3. [General](#General)\n",
" 1. [Meeting notes](#Meeting-notes)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e598c978",
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter\n",
"import sys\n",
"\n",
"sys.path.append('..')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9afa2e97",
"metadata": {},
"outputs": [],
"source": [
"from filter_clinvar_xml import filter_xml, pprint, iterate_cvs_from_xml\n",
"from clinvar_xml_io.clinvar_xml_io import *"
]
},
{
"cell_type": "markdown",
"id": "35684cff",
"metadata": {},
"source": [
"## Data from ClinVar\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"Questions to address:\n",
"\n",
"* Can we reliably get the background trait, i.e. the disease that the drug acts on?\n",
"* How many records are explicitly reporting efficacy phenotypes?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "86d81a4b",
"metadata": {},
"outputs": [],
"source": [
"# July 2022 data\n",
"drug_xml = '/home/april/projects/opentargets/drug-response.xml.gz'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b6ee82e0",
"metadata": {},
"outputs": [],
"source": [
"dataset = ClinVarDataset(drug_xml)"
]
},
{
"cell_type": "code",
"execution_count": 237,
"id": "65fe438f",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" current\n",
" NM_000769.4(CYP2C19):c.-806C>A AND clopidogrel response - Dosage, Efficacy, Toxicity/ADR\n",
" \n",
" \n",
" current\n",
" \n",
" reviewed by expert panel\n",
" drug response\n",
" \n",
" \n",
" \n",
" \n",
" germline\n",
" human\n",
" yes\n",
" \n",
" \n",
" curation\n",
" \n",
" \n",
" not provided\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" NM_000769.4(CYP2C19):c.-806C>A\n",
" \n",
" \n",
" NM_000769.2(CYP2C19):c.-806C>A\n",
" \n",
" NC_000010.11:94761899:C:A\n",
" \n",
" NG_055436.1:g.1260C>A\n",
" \n",
" \n",
" NG_008384.3:g.4220C>A\n",
" \n",
" \n",
" NC_000010.11:g.94761900C>A\n",
" \n",
" \n",
" NC_000010.10:g.96521657C>A\n",
" \n",
" \n",
" 10q23.33\n",
" \n",
" \n",
" \n",
" \n",
" cytochrome P450 family 2 subfamily C member 19\n",
" \n",
" \n",
" CYP2C19\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" CYP2C19 promoter\n",
" \n",
" \n",
" LOC110599570\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 21716271\n",
" 3234301\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" NM_000769.4(CYP2C19):c.-806C>A\n",
" \n",
" \n",
" NM_000769.4(CYP2C19):c.-806C>A\n",
" \n",
" \n",
" NM_000769.4(CYP2C19):c.-806C>A\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" clopidogrel response - Dosage, Efficacy, Toxicity/ADR\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" current\n",
" \n",
" reviewed by expert panel\n",
" drug response\n",
" \n",
" 19463375\n",
" \n",
" \n",
" 20083681\n",
" \n",
" \n",
" 20492469\n",
" \n",
" \n",
" 20801498\n",
" \n",
" \n",
" 20826260\n",
" \n",
" \n",
" 21392617\n",
" \n",
" \n",
" 22028352\n",
" \n",
" \n",
" 22190063\n",
" \n",
" \n",
" 22228204\n",
" \n",
" \n",
" 22462746\n",
" \n",
" \n",
" 22704413\n",
" \n",
" \n",
" 22955794\n",
" \n",
" \n",
" 22990067\n",
" \n",
" \n",
" 23364775\n",
" \n",
" \n",
" 23726091\n",
" \n",
" \n",
" 23809542\n",
" \n",
" \n",
" 23922007\n",
" \n",
" \n",
" 24019397\n",
" \n",
" PharmGKB Level of Evidence 1A: Annotation for a variant-drug combination in a CPIC or medical society-endorsed PGx guideline, or implemented at a PGRN site or in another major health system.\n",
" \n",
" \n",
" \n",
" \n",
" Pharmacogenomics knowledge for personalized medicine\n",
" \n",
" 22992668\n",
" \n",
" \n",
" \n",
" \n",
" germline\n",
" human\n",
" yes\n",
" \n",
" \n",
" curation\n",
" \n",
" \n",
" not provided\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" NC_000010.10:g.96521657C>A\n",
" \n",
" \n",
" \n",
" CYP2C19\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" clopidogrel response - Dosage, Efficacy, Toxicity/ADR\n",
" \n",
" \n",
" \n",
" Acute coronary syndrome\n",
" \n",
" \n",
" \n",
" \n",
" Coronary Artery Disease\n",
" \n",
" \n",
" \n",
" \n",
" Myocardial Infarction\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" https://www.pharmgkb.org/clinicalAnnotation/655386913\n",
" \n",
" Drug is not necessarily used to treat response condition\n",
" \n",
"\n",
"\n",
"\n"
]
}
],
"source": [
"# Entire CVS record (RCV + SCV) for reference\n",
"for raw_cvs_xml in iterate_cvs_from_xml(drug_xml):\n",
" pprint(raw_cvs_xml)\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "9dfc45d1",
"metadata": {},
"source": [
"Example [RCV000211201](https://www.ncbi.nlm.nih.gov/clinvar/RCV000211201/) - contains trait relationship between drug and disease but only in SCV not RCV record. (Note also there's only one SCV for this RCV.)\n",
"\n",
"**SCV:**\n",
"\n",
"```\n",
"\n",
" \n",
" \n",
" clopidogrel response - Dosage, Efficacy, Toxicity/ADR\n",
" \n",
" \n",
" \n",
" Acute coronary syndrome\n",
" \n",
" \n",
" \n",
" \n",
" Coronary Artery Disease\n",
" \n",
" \n",
" \n",
" \n",
" Myocardial Infarction\n",
" \n",
" \n",
" \n",
"\n",
"```\n",
"\n",
"**RCV:**\n",
"```\n",
"\n",
" \n",
" \n",
" clopidogrel response - Dosage, Efficacy, Toxicity/ADR\n",
" \n",
" \n",
" \n",
"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "aec271c5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RCV001824998\n",
"['Cabozantinib resistance', 'Entrectinib resistance', 'Larotrectinib resistance', 'Repotrectinib resistance', 'Selitrectinib resistance']\n"
]
}
],
"source": [
"# Check whether any of the RCV records have this kind of information\n",
"for record in dataset:\n",
" if len(record.trait_set) > 1:\n",
" # No trait set with both a drug and a disease\n",
" print(record.accession)\n",
" print([trait.preferred_or_other_valid_name for trait in record.trait_set])\n",
" for trait in record.trait_set:\n",
" # No traits in RCV with relationship element\n",
" relationships = find_elements(trait.trait_xml, './TraitRelationship')\n",
" if relationships:\n",
" print(record.accession)\n",
" pprint(trait.trait_xml)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "19bd85f7",
"metadata": {},
"outputs": [],
"source": [
"def get_name(x):\n",
" return ClinVarTrait(x, None).preferred_or_other_valid_name\n",
"\n",
"\n",
"def is_pgkb(raw_cvs_xml):\n",
" scvs = find_elements(raw_cvs_xml, './ClinVarAssertion/ClinVarSubmissionID')\n",
" submitters = {scv.attrib.get('submitter') for scv in scvs}\n",
" return 'PharmGKB' in submitters"
]
},
{
"cell_type": "code",
"execution_count": 239,
"id": "2bd06c38",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"*hmg coa reductase inhibitors response - Toxicity => ['statin-related myopathy']\n",
"*nicotine response - Toxicity => ['Tobacco Use Disorder']\n",
"*azathioprine response - Toxicity => ['Inflammatory Bowel Diseases', 'Myelosuppression']\n",
"Piroxicam response => ['Pain', 'Inflammation', 'Osteoarthritis', 'Rheumatoid arthritis']\n",
"*halothane response - Toxicity => ['Malignant Hyperthermia']\n",
"*warfarin response - Toxicity/ADR => ['Over-anticoagulation']\n",
"*efavirenz response - Metabolism/PK => ['HIV Infections']\n",
"Prednisolone response => ['Minimal change disease']\n",
"efavirenz response => ['HIV']\n",
"Deutetrabenazine response => ['Chorea', 'Huntington disease', 'Tardive dyskinesia']\n",
"Lesinurad response => ['Gout']\n",
"*rosuvastatin response - Efficacy => ['Hypercholesterolemia', 'Myocardial Infarction']\n",
"Dabrafenib response => ['Pancreatic Adenocarcinoma']\n",
"*tobramycin response - Toxicity => ['Ototoxicity']\n",
"*peginterferon alfa-2b and ribavirin response - Toxicity => ['Anemia', 'Hepatitis C, Chronic']\n",
"*captopril response - Efficacy => ['Diabetes Mellitus, Type 2', 'Heart Failure', 'Pulmonary Disease, Chronic Obstructive']\n",
"Everolimus response => [None]\n",
"Dopamine agonist response => ['Macroprolactinoma']\n",
"Imatinib response => [None]\n",
"Corticosteroid response => ['Chronic kidney disease']\n",
"*Platinum compounds response - Efficacy => ['Neoplasms']\n",
"*streptomycin response - Toxicity => ['Ototoxicity']\n",
"Warfarin response => ['hemorrhage']\n",
"*atorvastatin response - Toxicity => ['statin-related myopathy']\n",
"Anti-PDL1 response => ['Cancer']\n",
"*simvastatin response - Toxicity => ['statin-related myopathy']\n",
"*gefitinib response - Efficacy => ['Carcinoma, Non-Small-Cell Lung', 'Drug Resistance']\n",
"*hydrochlorothiazide response - Efficacy => ['Essential hypertension', 'Hypertension']\n",
"*interferons, peginterferon alfa-2a, peginterferon alfa-2b and ribavirin response - Efficacy => ['Hepatitis C, Chronic']\n",
"*fluorouracil response - Toxicity => ['Neoplasms']\n",
"*desflurane response - Toxicity => ['Malignant Hyperthermia']\n",
"*methotrexate response - Metabolism/PK => ['Burkitt Lymphoma', 'Leukemia', 'Lymphoma', 'Lymphoma, T-Cell', 'Precursor Cell Lymphoblastic Leukemia-Lymphoma']\n",
"*nevirapine response - Toxicity => ['Epidermal Necrolysis, Toxic', 'Stevens-Johnson Syndrome']\n",
"Phenytoin response => ['status epilepticus']\n",
"Regorafenib response => ['Colorectal Neoplasms']\n",
"None => ['Non-small cell lung cancer']\n",
"*atorvastatin response - Efficacy => ['Coronary Disease', 'Hyperlipidemias']\n",
"*ivacaftor / lumacaftor response - Efficacy => ['Cystic Fibrosis']\n",
"Histone Methylation Therapy response => ['Cancer']\n",
"*peginterferon alfa-2a, peginterferon alfa-2b, ribavirin and telaprevir response - Efficacy => ['Hepatitis C, Chronic']\n",
"RAS Inhibitor response => ['Cancer']\n",
"*pravastatin response - Efficacy => ['Coronary Disease', 'Myocardial Infarction']\n",
"deoxygalactonojirimycin response => ['Fabry disease']\n",
"*methoxyflurane response - Toxicity => ['Malignant Hyperthermia']\n",
"*phenprocoumon response - Toxicity => ['Hemorrhage', 'over-anticoagulation', 'time above therapeutic range']\n",
"*efavirenz response - Toxicity => ['HIV Infections']\n",
"*tegafur response - Toxicity => ['Neoplasms']\n",
"MEK Inhibitor response => ['Cancer']\n",
"*ivacaftor / tezacaftor response - Efficacy => ['Cystic Fibrosis']\n",
"*enflurane response - Toxicity => ['Malignant Hyperthermia']\n",
"AKT1 Inhibitor response => ['Cancer']\n",
"*rosuvastatin response - Metabolism/PK => ['Hypercholesterolemia']\n",
"*methotrexate response - Toxicity => ['Arthritis, Juvenile Rheumatoid', 'Arthritis, Psoriatic', 'Arthritis, Rheumatoid', 'Drug Toxicity', 'Leukopenia', 'Neoplasms', 'Neutropenia', 'Osteosarcoma', 'Precursor Cell Lymphoblastic Leukemia-Lymphoma', 'Thrombocytopenia', 'Toxic liver disease', 'hematotoxicity', 'mucositis', 'primary central nervous system lymphoma']\n",
"*salmeterol response - Efficacy => ['Asthma']\n",
"*peginterferon alfa-2a, peginterferon alfa-2b and ribavirin response - Efficacy => ['Hepatitis C']\n",
"*acenocoumarol response - Dosage => ['Atrial Fibrillation']\n",
"Corticosteroid response => ['Minimal Change disease']\n",
"Flurbiprofen response => ['Pain', 'Inflammation', 'Osteoarthritis', 'Rheumatoid Arthritis', 'Bursitis', 'Tendinitis']\n",
"WEE1 Inhibitor response => ['Cancer']\n",
"*peginterferon alfa-2b response - Efficacy => ['HIV Infections', 'Hepatitis C']\n",
"*ethanol response - Toxicity => ['Alcoholism']\n",
"*etanercept response - Efficacy => ['Arthritis, Psoriatic', 'Arthritis, Rheumatoid', 'Crohn Disease', 'Inflammation', 'Psoriasis', 'Spondylitis, Ankylosing']\n",
"*carbamazepine response - Dosage => ['Epilepsy']\n",
"*boceprevir, peginterferon alfa-2a, peginterferon alfa-2b and ribavirin response - Efficacy => ['Hepatitis C, Chronic']\n",
"*nevirapine response - Metabolism/PK => ['HIV Infections']\n",
"PARP Inhibitor response => ['Cancer']\n",
"*warfarin response - Toxicity => ['Hemorrhage', 'over-anticoagulation']\n",
"*capecitabine response - Toxicity => ['Neoplasms']\n",
"Azathioprine intolerance => ['myasthenia gravis']\n",
"Corticosteroid response => ['Minimal change disease']\n",
"mTOR Inhibitor response => ['Cancer']\n",
"*ribavirin response - Efficacy => ['HIV Infections', 'Hepatitis C']\n",
"Gentamicin response => ['Bacterial infection', 'Neonatal sepsis']\n",
"Androgen deprivation therapy response => ['Prostate neoplasm']\n",
"*succinylcholine response - Toxicity => ['Malignant Hyperthermia']\n",
"VEGF Inhibitors response => ['Cancer']\n",
"all trans retinoic acid (ATRA) response => ['Acute promyelocytic leukemia']\n",
"*tacrolimus response - Metabolism/PK => ['Kidney Transplantation', 'Proteinuria', 'liver transplantation']\n",
"Vemurafenib-Cobimetinib Response => ['Melanoma']\n",
"Corticosteroid response => ['Focal segmental glomerulosclerosis']\n",
"Trametinib-Dabrafenib Response => ['Melanoma']\n",
"*gentamicin response - Toxicity => ['Ototoxicity']\n",
"*aminoglycoside antibacterials response - Toxicity => ['Ototoxicity']\n",
"*clopidogrel response - Dosage, Efficacy, Toxicity/ADR => ['Acute coronary syndrome', 'Coronary Artery Disease', 'Myocardial Infarction']\n",
"Gemcitabine response => ['non-small cell lung cancer']\n",
"Corticosteroid response => ['Nephrotic syndrome']\n",
"*kanamycin response - Toxicity => ['Ototoxicity']\n",
"Pazopanib response => ['malignant granular cell tumor']\n",
"*ivacaftor response - Efficacy => ['Cystic Fibrosis']\n",
"*methotrexate response - Efficacy => ['Arthritis, Rheumatoid']\n",
"*erlotinib response - Efficacy => ['Adenocarcinoma', 'Carcinoma, Non-Small-Cell Lung', 'Drug Resistance', 'Lung Neoplasms']\n",
"*amikacin response - Toxicity => ['Ototoxicity']\n",
"*isoflurane response - Toxicity => ['Malignant Hyperthermia']\n",
"Gefitinib Response => ['Non-small cell lung carcinoma']\n",
"Erlotinib Response => ['Non-small cell lung carcinoma']\n",
"None => ['Leukemia', 'Inflammatory bowel disease', 'Rheumatoid arthritis', 'Non-Hodgkin lymphoma']\n",
"*gefitinib response - Efficacy => ['Carcinoma, Non-Small-Cell Lung']\n",
"*sevoflurane response - Toxicity => ['Malignant Hyperthermia']\n",
"Tamoxifen response => ['Breast cancer']\n",
"*irinotecan response - Toxicity => ['Neutropenia']\n",
"*peginterferon alfa-2a response - Efficacy => ['HIV Infections', 'Hepatitis C']\n",
"Doxorubicin response => [None]\n",
"Prednisolone response => ['Focal segmental glomerulosclerosis 2']\n",
"Suxamethonium response - slow metabolism => ['Butyrylcholinesterase deficiency']\n"
]
}
],
"source": [
"# Check whether all the SCV records have this kind of information\n",
"n = 0\n",
"count_all = 0\n",
"count_pgkb = 0\n",
"all_strs = set()\n",
"for raw_cvs_xml in iterate_cvs_from_xml(drug_xml):\n",
" n += 1\n",
" elts = find_elements(raw_cvs_xml, './ClinVarAssertion/TraitSet/Trait')\n",
" for e in elts:\n",
" if e.attrib['Type'] == 'DrugResponse':\n",
" relations = find_elements(e, './TraitRelationship')\n",
" name = get_name(e)\n",
" background_traits = []\n",
" for r in relations:\n",
" if r.attrib['Type'] == 'DrugResponseAndDisease':\n",
" background_traits.append(get_name(r))\n",
" if background_traits:\n",
" count_all += 1\n",
" if is_pgkb(raw_cvs_xml):\n",
" count_pgkb += 1\n",
" all_strs.add(f'*{get_name(e)} => {background_traits}')\n",
" else:\n",
" all_strs.add(f'{get_name(e)} => {background_traits}')\n",
"\n",
"for s in all_strs:\n",
" print(s)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "034e22bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Out of 4970 records, found 576 with drug response & disease relationship (361 from PharmGKB).\n"
]
}
],
"source": [
"print(f'Out of {n} records, found {count_all} with drug response & disease relationship ({count_pgkb} from PharmGKB).')"
]
},
{
"cell_type": "code",
"execution_count": 235,
"id": "9f1b0156",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"count_all = 0\n",
"count_pgkb = 0\n",
"for raw_cvs_xml in iterate_cvs_from_xml(drug_xml):\n",
" elts = find_elements(raw_cvs_xml, './ClinVarAssertion/TraitSet/Trait')\n",
" for e in elts:\n",
" if e.attrib['Type'] == 'DrugResponse':\n",
" name = get_name(e)\n",
" if name and 'efficacy' in name.lower():\n",
" count_all += 1\n",
" if is_pgkb(raw_cvs_xml):\n",
" count_pgkb += 1"
]
},
{
"cell_type": "code",
"execution_count": 236,
"id": "a9ac0e22",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Out of 4970 records, found 54 with efficacy phenotype (54 from PharmGKB).\n"
]
}
],
"source": [
"print(f'Out of {n} records, found {count_all} with efficacy phenotype ({count_pgkb} from PharmGKB).')"
]
},
{
"cell_type": "markdown",
"id": "96095f19",
"metadata": {},
"source": [
"### Thoughts\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"* Is it worth starting to parse SCV for drug response / disease trait relationships?\n",
" * Might be relatively straightforward to do in this restricted case\n",
" * Opens up a can of worms, e.g. what happens if SCVs don't agree? Do we end up redoing the work of aggregation?\n",
"* Why does ClinVar exclude this info from the RCV anyway?\n",
"* Is it worth trying other ways of linking drug & disease within ClinVar?\n",
" * e.g. different RCV with same VCV, one for drug and one for disease\n",
" * same SCV associated with different RCVs via different traits?\n",
"* Counts summary: **4970** drug response records\n",
" * **401** with PharmGKB submission (previous notebook)\n",
" * **576** with drug response & disease relationship (in SCV only)\n",
" * Of these, **361** from PharmGKB\n",
" * **54** with explicit efficacy phenotype, all from PharmGKB"
]
},
{
"cell_type": "markdown",
"id": "d8680977",
"metadata": {},
"source": [
"## PharmGKB data\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"* Compare this with what PharmGKB submissions contain in ClinVar\n",
"* Also consider how we would get consequences and how we'd connect to ClinVar data"
]
},
{
"cell_type": "markdown",
"id": "7ef6c4ab",
"metadata": {},
"source": [
"General PharmGKB notes:\n",
"* [Multiple datasets](https://www.pharmgkb.org/downloads) that we could cross-reference\n",
" * I looked at some of the others but the clinical annotations are probably all we need/can use\n",
"* \"PharmGKB submits Level 1 & 2 Clinical Annotations PGx into ClinVar\" - see [levels](https://www.pharmgkb.org/page/clinAnnLevels)"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "1f68c3b8",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os\n",
"from IPython.display import display"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "d5a832f0",
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_colwidth', None)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "4ef959c6",
"metadata": {},
"outputs": [],
"source": [
"pharmgkb_root = '/home/april/projects/opentargets/pharmgkb'"
]
},
{
"cell_type": "markdown",
"id": "3443b7dd",
"metadata": {},
"source": [
"### Clinical annotations\n",
"\n",
"[Top of page](#Table-of-contents)"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "5613fc2a",
"metadata": {},
"outputs": [],
"source": [
"clinical_annotations = pd.read_csv(os.path.join(pharmgkb_root, 'clinical', 'clinical_annotations.tsv'), sep='\\t')\n",
"clinical_alleles = pd.read_csv(os.path.join(pharmgkb_root, 'clinical', 'clinical_ann_alleles.tsv'), sep='\\t')\n",
"clinical_evidence = pd.read_csv(os.path.join(pharmgkb_root, 'clinical', 'clinical_ann_evidence.tsv'), sep='\\t')"
]
},
{
"cell_type": "code",
"execution_count": 132,
"id": "d8183890",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5013"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(clinical_annotations)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "4d97c071",
"metadata": {},
"outputs": [],
"source": [
"def show_id(i):\n",
" for t in (clinical_annotations[clinical_annotations['Clinical Annotation ID'] == i],\n",
" clinical_alleles[clinical_alleles['Clinical Annotation ID'] == i],\n",
" clinical_evidence[clinical_evidence['Clinical Annotation ID'] == i]):\n",
" display(t)"
]
},
{
"cell_type": "markdown",
"id": "e60a5eda",
"metadata": {},
"source": [
"Two examples: one with RS ID (981755803) and one with star allele only (1451243980)"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "78f3063a",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Variant/Haplotypes | \n",
" Gene | \n",
" Level of Evidence | \n",
" Level Override | \n",
" Level Modifiers | \n",
" Score | \n",
" Phenotype Category | \n",
" PMID Count | \n",
" Evidence Count | \n",
" Drug(s) | \n",
" Phenotype(s) | \n",
" Latest History Date (YYYY-MM-DD) | \n",
" URL | \n",
" Specialty Population | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 981755803 | \n",
" rs75527207 | \n",
" CFTR | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 234.875 | \n",
" Efficacy | \n",
" 28 | \n",
" 30 | \n",
" ivacaftor | \n",
" Cystic Fibrosis | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/981755803 | \n",
" Pediatric | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Variant/Haplotypes Gene Level of Evidence \\\n",
"0 981755803 rs75527207 CFTR 1A \n",
"\n",
" Level Override Level Modifiers Score Phenotype Category \\\n",
"0 NaN Rare Variant; Tier 1 VIP 234.875 Efficacy \n",
"\n",
" PMID Count Evidence Count Drug(s) Phenotype(s) \\\n",
"0 28 30 ivacaftor Cystic Fibrosis \n",
"\n",
" Latest History Date (YYYY-MM-DD) \\\n",
"0 2021-03-24 \n",
"\n",
" URL Specialty Population \n",
"0 https://www.pharmgkb.org/clinicalAnnotation/981755803 Pediatric "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Genotype/Allele | \n",
" Annotation Text | \n",
" Allele Function | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 981755803 | \n",
" AA | \n",
" Patients with the rs75527207 AA genotype (two copies of the CFTR G551D variant) and cystic fibrosis may respond to ivacaftor treatment. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 981755803 | \n",
" AG | \n",
" Patients with the rs75527207 AG genotype (one copy of the CFTR G551D variant) and cystic fibrosis may respond to ivacaftor treatment. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 981755803 | \n",
" GG | \n",
" Patients with the rs75527207 GG genotype (do not have a copy of the CFTR G551D variant) and cystic fibrosis have an unknown response to ivacaftor treatment, as response may depend on the presence of other CFTR variants. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Genotype/Allele \\\n",
"0 981755803 AA \n",
"1 981755803 AG \n",
"2 981755803 GG \n",
"\n",
" Annotation Text \\\n",
"0 Patients with the rs75527207 AA genotype (two copies of the CFTR G551D variant) and cystic fibrosis may respond to ivacaftor treatment. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. \n",
"1 Patients with the rs75527207 AG genotype (one copy of the CFTR G551D variant) and cystic fibrosis may respond to ivacaftor treatment. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. \n",
"2 Patients with the rs75527207 GG genotype (do not have a copy of the CFTR G551D variant) and cystic fibrosis have an unknown response to ivacaftor treatment, as response may depend on the presence of other CFTR variants. FDA-approved drug labeling information and CPIC guidelines indicate use of ivacaftor in cystic fibrosis patients with at least one copy of a list of 33 CFTR genetic variants, including G551D. Other genetic and clinical factors may also influence response to ivacaftor. \n",
"\n",
" Allele Function \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Evidence ID | \n",
" Evidence Type | \n",
" Evidence URL | \n",
" PMID | \n",
" Summary | \n",
" Score | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 981755803 | \n",
" PA166114461 | \n",
" Guideline Annotation | \n",
" https://www.pharmgkb.org/guidelineAnnotation/PA166114461 | \n",
" NaN | \n",
" Annotation of CPIC Guideline for ivacaftor and CFTR | \n",
" 100 | \n",
"
\n",
" \n",
" 1 | \n",
" 981755803 | \n",
" PA166104890 | \n",
" Label Annotation | \n",
" https://www.pharmgkb.org/labelAnnotation/PA166104890 | \n",
" NaN | \n",
" Annotation of FDA Label for ivacaftor and CFTR | \n",
" 100 | \n",
"
\n",
" \n",
" 2 | \n",
" 981755803 | \n",
" 981755665 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/981755665 | \n",
" 21083385.0 | \n",
" Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 3 | \n",
" 981755803 | \n",
" 981755678 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/981755678 | \n",
" 22047557.0 | \n",
" Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 981755803 | \n",
" 982006840 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/982006840 | \n",
" 23313410.0 | \n",
" Allele A is associated with response to ivacaftor in men with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 5 | \n",
" 981755803 | \n",
" 982009991 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/982009991 | \n",
" 23590265.0 | \n",
" Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. | \n",
" 2.25 | \n",
"
\n",
" \n",
" 6 | \n",
" 981755803 | \n",
" 1043737597 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1043737597 | \n",
" 23757359.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 7 | \n",
" 981755803 | \n",
" 1043737620 | \n",
" Variant Functional Assay Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1043737620 | \n",
" 23757361.0 | \n",
" Allele A is associated with increased activity of CFTR when treated with ivacaftor in transfected CHO cells. | \n",
" 0.0 | \n",
"
\n",
" \n",
" 8 | \n",
" 981755803 | \n",
" 1043737636 | \n",
" Variant Functional Assay Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1043737636 | \n",
" 23891399.0 | \n",
" Allele A is associated with activity of CFTR when treated with ivacaftor in FRT cell lines. | \n",
" 0.0 | \n",
"
\n",
" \n",
" 9 | \n",
" 981755803 | \n",
" 1183629335 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1183629335 | \n",
" 24066763.0 | \n",
" Genotype AA is associated with response to ivacaftor in women with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 10 | \n",
" 981755803 | \n",
" 1448267532 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448267532 | \n",
" 27745802.0 | \n",
" Genotypes AA + AG is associated with decreased severity of bone density when treated with ivacaftor in people with Cystic Fibrosis as compared to genotype GG. | \n",
" 1.5 | \n",
"
\n",
" \n",
" 11 | \n",
" 981755803 | \n",
" 1448423752 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448423752 | \n",
" 27773592.0 | \n",
" Genotypes AA + AG is associated with increased response to ivacaftor in people with Cystic Fibrosis as compared to genotype GG. | \n",
" 0.875 | \n",
"
\n",
" \n",
" 12 | \n",
" 981755803 | \n",
" 1449191908 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449191908 | \n",
" 25682022.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 13 | \n",
" 981755803 | \n",
" 1449192031 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192031 | \n",
" 28651844.0 | \n",
" Allele A is associated with decreased likelihood of cystic fibrosis pulmonary exacerbation when treated with ivacaftor in people with Cystic Fibrosis. | \n",
" 3.0 | \n",
"
\n",
" \n",
" 14 | \n",
" 981755803 | \n",
" 1449192055 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192055 | \n",
" 28711222.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.25 | \n",
"
\n",
" \n",
" 15 | \n",
" 981755803 | \n",
" 1449192093 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192093 | \n",
" 25311995.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 0.0 | \n",
"
\n",
" \n",
" 16 | \n",
" 981755803 | \n",
" 1449192439 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192439 | \n",
" 28611235.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 1.5 | \n",
"
\n",
" \n",
" 17 | \n",
" 981755803 | \n",
" 1449192481 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192481 | \n",
" 26135562.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 18 | \n",
" 981755803 | \n",
" 1449192494 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192494 | \n",
" 25171465.0 | \n",
" Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 19 | \n",
" 981755803 | \n",
" 1449192576 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192576 | \n",
" 25755212.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 20 | \n",
" 981755803 | \n",
" 1449192615 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192615 | \n",
" 26568242.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.5 | \n",
"
\n",
" \n",
" 21 | \n",
" 981755803 | \n",
" 1449192709 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192709 | \n",
" 25473543.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 22 | \n",
" 981755803 | \n",
" 1449192721 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449192721 | \n",
" 25145599.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.5 | \n",
"
\n",
" \n",
" 23 | \n",
" 981755803 | \n",
" 1450043422 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1450043422 | \n",
" 23628510.0 | \n",
" Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. | \n",
" 3.0 | \n",
"
\n",
" \n",
" 24 | \n",
" 981755803 | \n",
" 1184512440 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1184512440 | \n",
" 25049054.0 | \n",
" Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 1.5 | \n",
"
\n",
" \n",
" 25 | \n",
" 981755803 | \n",
" 981755746 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/981755746 | \n",
" 22942289.0 | \n",
" Allele A is associated with increased response to ivacaftor. | \n",
" This annotation is not used for clinical annotation scoring. | \n",
"
\n",
" \n",
" 26 | \n",
" 981755803 | \n",
" 981755699 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/981755699 | \n",
" 19846789.0 | \n",
" Allele A is associated with increased response to ivacaftor. | \n",
" This annotation is not used for clinical annotation scoring. | \n",
"
\n",
" \n",
" 27 | \n",
" 981755803 | \n",
" 981755787 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/981755787 | \n",
" 22293084.0 | \n",
" Allele A is associated with increased response to ivacaftor. | \n",
" This annotation is not used for clinical annotation scoring. | \n",
"
\n",
" \n",
" 28 | \n",
" 981755803 | \n",
" 1446903789 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1446903789 | \n",
" 24461666.0 | \n",
" Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. | \n",
" 2.5 | \n",
"
\n",
" \n",
" 29 | \n",
" 981755803 | \n",
" 1448099051 | \n",
" Variant Drug Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448099051 | \n",
" 27158673.0 | \n",
" Genotypes AA + AG are associated with increased response to ivacaftor in people with Cystic Fibrosis as compared to genotype GG. | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Evidence ID Evidence Type \\\n",
"0 981755803 PA166114461 Guideline Annotation \n",
"1 981755803 PA166104890 Label Annotation \n",
"2 981755803 981755665 Variant Drug Annotation \n",
"3 981755803 981755678 Variant Drug Annotation \n",
"4 981755803 982006840 Variant Drug Annotation \n",
"5 981755803 982009991 Variant Drug Annotation \n",
"6 981755803 1043737597 Variant Drug Annotation \n",
"7 981755803 1043737620 Variant Functional Assay Annotation \n",
"8 981755803 1043737636 Variant Functional Assay Annotation \n",
"9 981755803 1183629335 Variant Drug Annotation \n",
"10 981755803 1448267532 Variant Phenotype Annotation \n",
"11 981755803 1448423752 Variant Drug Annotation \n",
"12 981755803 1449191908 Variant Drug Annotation \n",
"13 981755803 1449192031 Variant Phenotype Annotation \n",
"14 981755803 1449192055 Variant Drug Annotation \n",
"15 981755803 1449192093 Variant Drug Annotation \n",
"16 981755803 1449192439 Variant Drug Annotation \n",
"17 981755803 1449192481 Variant Drug Annotation \n",
"18 981755803 1449192494 Variant Drug Annotation \n",
"19 981755803 1449192576 Variant Drug Annotation \n",
"20 981755803 1449192615 Variant Drug Annotation \n",
"21 981755803 1449192709 Variant Drug Annotation \n",
"22 981755803 1449192721 Variant Drug Annotation \n",
"23 981755803 1450043422 Variant Drug Annotation \n",
"24 981755803 1184512440 Variant Drug Annotation \n",
"25 981755803 981755746 Variant Drug Annotation \n",
"26 981755803 981755699 Variant Drug Annotation \n",
"27 981755803 981755787 Variant Drug Annotation \n",
"28 981755803 1446903789 Variant Drug Annotation \n",
"29 981755803 1448099051 Variant Drug Annotation \n",
"\n",
" Evidence URL PMID \\\n",
"0 https://www.pharmgkb.org/guidelineAnnotation/PA166114461 NaN \n",
"1 https://www.pharmgkb.org/labelAnnotation/PA166104890 NaN \n",
"2 https://www.pharmgkb.org/variantAnnotation/981755665 21083385.0 \n",
"3 https://www.pharmgkb.org/variantAnnotation/981755678 22047557.0 \n",
"4 https://www.pharmgkb.org/variantAnnotation/982006840 23313410.0 \n",
"5 https://www.pharmgkb.org/variantAnnotation/982009991 23590265.0 \n",
"6 https://www.pharmgkb.org/variantAnnotation/1043737597 23757359.0 \n",
"7 https://www.pharmgkb.org/variantAnnotation/1043737620 23757361.0 \n",
"8 https://www.pharmgkb.org/variantAnnotation/1043737636 23891399.0 \n",
"9 https://www.pharmgkb.org/variantAnnotation/1183629335 24066763.0 \n",
"10 https://www.pharmgkb.org/variantAnnotation/1448267532 27745802.0 \n",
"11 https://www.pharmgkb.org/variantAnnotation/1448423752 27773592.0 \n",
"12 https://www.pharmgkb.org/variantAnnotation/1449191908 25682022.0 \n",
"13 https://www.pharmgkb.org/variantAnnotation/1449192031 28651844.0 \n",
"14 https://www.pharmgkb.org/variantAnnotation/1449192055 28711222.0 \n",
"15 https://www.pharmgkb.org/variantAnnotation/1449192093 25311995.0 \n",
"16 https://www.pharmgkb.org/variantAnnotation/1449192439 28611235.0 \n",
"17 https://www.pharmgkb.org/variantAnnotation/1449192481 26135562.0 \n",
"18 https://www.pharmgkb.org/variantAnnotation/1449192494 25171465.0 \n",
"19 https://www.pharmgkb.org/variantAnnotation/1449192576 25755212.0 \n",
"20 https://www.pharmgkb.org/variantAnnotation/1449192615 26568242.0 \n",
"21 https://www.pharmgkb.org/variantAnnotation/1449192709 25473543.0 \n",
"22 https://www.pharmgkb.org/variantAnnotation/1449192721 25145599.0 \n",
"23 https://www.pharmgkb.org/variantAnnotation/1450043422 23628510.0 \n",
"24 https://www.pharmgkb.org/variantAnnotation/1184512440 25049054.0 \n",
"25 https://www.pharmgkb.org/variantAnnotation/981755746 22942289.0 \n",
"26 https://www.pharmgkb.org/variantAnnotation/981755699 19846789.0 \n",
"27 https://www.pharmgkb.org/variantAnnotation/981755787 22293084.0 \n",
"28 https://www.pharmgkb.org/variantAnnotation/1446903789 24461666.0 \n",
"29 https://www.pharmgkb.org/variantAnnotation/1448099051 27158673.0 \n",
"\n",
" Summary \\\n",
"0 Annotation of CPIC Guideline for ivacaftor and CFTR \n",
"1 Annotation of FDA Label for ivacaftor and CFTR \n",
"2 Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"3 Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"4 Allele A is associated with response to ivacaftor in men with Cystic Fibrosis. \n",
"5 Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. \n",
"6 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"7 Allele A is associated with increased activity of CFTR when treated with ivacaftor in transfected CHO cells. \n",
"8 Allele A is associated with activity of CFTR when treated with ivacaftor in FRT cell lines. \n",
"9 Genotype AA is associated with response to ivacaftor in women with Cystic Fibrosis. \n",
"10 Genotypes AA + AG is associated with decreased severity of bone density when treated with ivacaftor in people with Cystic Fibrosis as compared to genotype GG. \n",
"11 Genotypes AA + AG is associated with increased response to ivacaftor in people with Cystic Fibrosis as compared to genotype GG. \n",
"12 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"13 Allele A is associated with decreased likelihood of cystic fibrosis pulmonary exacerbation when treated with ivacaftor in people with Cystic Fibrosis. \n",
"14 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"15 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"16 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"17 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"18 Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. \n",
"19 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"20 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"21 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"22 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"23 Allele A is associated with response to ivacaftor in children with Cystic Fibrosis. \n",
"24 Allele A is associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"25 Allele A is associated with increased response to ivacaftor. \n",
"26 Allele A is associated with increased response to ivacaftor. \n",
"27 Allele A is associated with increased response to ivacaftor. \n",
"28 Genotypes AA + AG are associated with response to ivacaftor in people with Cystic Fibrosis. \n",
"29 Genotypes AA + AG are associated with increased response to ivacaftor in people with Cystic Fibrosis as compared to genotype GG. \n",
"\n",
" Score \n",
"0 100 \n",
"1 100 \n",
"2 0.25 \n",
"3 2.0 \n",
"4 0.25 \n",
"5 2.25 \n",
"6 2.0 \n",
"7 0.0 \n",
"8 0.0 \n",
"9 0.25 \n",
"10 1.5 \n",
"11 0.875 \n",
"12 0.25 \n",
"13 3.0 \n",
"14 2.25 \n",
"15 0.0 \n",
"16 1.5 \n",
"17 2.0 \n",
"18 0.25 \n",
"19 2.0 \n",
"20 2.5 \n",
"21 0.25 \n",
"22 2.5 \n",
"23 3.0 \n",
"24 1.5 \n",
"25 This annotation is not used for clinical annotation scoring. \n",
"26 This annotation is not used for clinical annotation scoring. \n",
"27 This annotation is not used for clinical annotation scoring. \n",
"28 2.5 \n",
"29 2.0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"show_id(981755803)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "c19bec99",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Variant/Haplotypes | \n",
" Gene | \n",
" Level of Evidence | \n",
" Level Override | \n",
" Level Modifiers | \n",
" Score | \n",
" Phenotype Category | \n",
" PMID Count | \n",
" Evidence Count | \n",
" Drug(s) | \n",
" Phenotype(s) | \n",
" Latest History Date (YYYY-MM-DD) | \n",
" URL | \n",
" Specialty Population | \n",
"
\n",
" \n",
" \n",
" \n",
" 4996 | \n",
" 1451243980 | \n",
" CYP2B6*1, CYP2B6*2, CYP2B6*6, CYP2B6*18, CYP2B6*38 | \n",
" CYP2B6 | \n",
" 1A | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 211.5 | \n",
" Toxicity | \n",
" 12 | \n",
" 14 | \n",
" efavirenz | \n",
" HIV Infections | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1451243980 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID \\\n",
"4996 1451243980 \n",
"\n",
" Variant/Haplotypes Gene \\\n",
"4996 CYP2B6*1, CYP2B6*2, CYP2B6*6, CYP2B6*18, CYP2B6*38 CYP2B6 \n",
"\n",
" Level of Evidence Level Override Level Modifiers Score \\\n",
"4996 1A NaN Tier 1 VIP 211.5 \n",
"\n",
" Phenotype Category PMID Count Evidence Count Drug(s) \\\n",
"4996 Toxicity 12 14 efavirenz \n",
"\n",
" Phenotype(s) Latest History Date (YYYY-MM-DD) \\\n",
"4996 HIV Infections 2021-03-24 \n",
"\n",
" URL \\\n",
"4996 https://www.pharmgkb.org/clinicalAnnotation/1451243980 \n",
"\n",
" Specialty Population \n",
"4996 NaN "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Genotype/Allele | \n",
" Annotation Text | \n",
" Allele Function | \n",
"
\n",
" \n",
" \n",
" \n",
" 15404 | \n",
" 1451243980 | \n",
" *1 | \n",
" The CYP2B6*1 allele is assigned as a normal function allele by CPIC. Patients carrying CYP2B6*1 allele in combination with another normal function allele may have decreased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with a no or decreased function allele in combination with a normal or increased function allele or with two no or decreased function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the toxicity of efavirenz. | \n",
" Normal function | \n",
"
\n",
" \n",
" 15405 | \n",
" 1451243980 | \n",
" *2 | \n",
" The CYP2B6*2 allele is assigned as a normal function allele by CPIC. Patients carrying CYP2B6*2 allele in combination with another normal function allele may have decreased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with a no or decreased function allele in combination with a normal or increased function allele or with two no or decreased function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the toxicity of efavirenz. | \n",
" Normal function | \n",
"
\n",
" \n",
" 15406 | \n",
" 1451243980 | \n",
" *6 | \n",
" The CYP2B6*6 allele is assigned as a decreased function allele by CPIC. Patients carrying the CYP2B6*6 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence toxicity of efavirenz. | \n",
" Decreased function | \n",
"
\n",
" \n",
" 15407 | \n",
" 1451243980 | \n",
" *18 | \n",
" The CYP2B6*18 allele is assigned as a no function allele by CPIC. Patients carrying the CYP2B6*18 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence toxicity of efavirenz. | \n",
" No function | \n",
"
\n",
" \n",
" 15408 | \n",
" 1451243980 | \n",
" *38 | \n",
" The CYP2B6*38 allele is assigned as a no function allele by CPIC. Patients carrying the CYP2B6*38 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. Other genetic and clinical factors may also influence toxicity of efavirenz. | \n",
" No function | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Genotype/Allele \\\n",
"15404 1451243980 *1 \n",
"15405 1451243980 *2 \n",
"15406 1451243980 *6 \n",
"15407 1451243980 *18 \n",
"15408 1451243980 *38 \n",
"\n",
" Annotation Text \\\n",
"15404 The CYP2B6*1 allele is assigned as a normal function allele by CPIC. Patients carrying CYP2B6*1 allele in combination with another normal function allele may have decreased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with a no or decreased function allele in combination with a normal or increased function allele or with two no or decreased function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the toxicity of efavirenz. \n",
"15405 The CYP2B6*2 allele is assigned as a normal function allele by CPIC. Patients carrying CYP2B6*2 allele in combination with another normal function allele may have decreased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with a no or decreased function allele in combination with a normal or increased function allele or with two no or decreased function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the toxicity of efavirenz. \n",
"15406 The CYP2B6*6 allele is assigned as a decreased function allele by CPIC. Patients carrying the CYP2B6*6 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence toxicity of efavirenz. \n",
"15407 The CYP2B6*18 allele is assigned as a no function allele by CPIC. Patients carrying the CYP2B6*18 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence toxicity of efavirenz. \n",
"15408 The CYP2B6*38 allele is assigned as a no function allele by CPIC. Patients carrying the CYP2B6*38 allele in combination with a normal, decreased, no, or increased function allele may have increased risk of adverse events (eg. liver toxicity or CNS side effects) when treated with efavirenz as compared to patients with two normal function alleles. Other genetic and clinical factors may also influence toxicity of efavirenz. \n",
"\n",
" Allele Function \n",
"15404 Normal function \n",
"15405 Normal function \n",
"15406 Decreased function \n",
"15407 No function \n",
"15408 No function "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Evidence ID | \n",
" Evidence Type | \n",
" Evidence URL | \n",
" PMID | \n",
" Summary | \n",
" Score | \n",
"
\n",
" \n",
" \n",
" \n",
" 14695 | \n",
" 1451243980 | \n",
" PA166182603 | \n",
" Guideline Annotation | \n",
" https://www.pharmgkb.org/guidelineAnnotation/PA166182603 | \n",
" NaN | \n",
" Annotation of CPIC Guideline for efavirenz and CYP2B6 | \n",
" 100 | \n",
"
\n",
" \n",
" 14696 | \n",
" 1451243980 | \n",
" PA166182846 | \n",
" Guideline Annotation | \n",
" https://www.pharmgkb.org/guidelineAnnotation/PA166182846 | \n",
" NaN | \n",
" Annotation of DPWG Guideline for efavirenz and CYP2B6 | \n",
" 100 | \n",
"
\n",
" \n",
" 14697 | \n",
" 1451243980 | \n",
" 1451289240 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1451289240 | \n",
" 25889207.0 | \n",
" Allele C is not associated with increased likelihood of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections as compared to allele T. | \n",
" -1.5 | \n",
"
\n",
" \n",
" 14698 | \n",
" 1451243980 | \n",
" 1183634232 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1183634232 | \n",
" 24080498.0 | \n",
" Genotypes CC + CT are not associated with risk of Neurotoxicity Syndromes when treated with efavirenz in people with HIV Infections as compared to genotype TT. | \n",
" -1.75 | \n",
"
\n",
" \n",
" 14699 | \n",
" 1451243980 | \n",
" 1184473287 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1184473287 | \n",
" 24517233.0 | \n",
" Genotype TT is associated with increased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 14700 | \n",
" 1451243980 | \n",
" 1448636199 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448636199 | \n",
" 28692529.0 | \n",
" Genotype CC is associated with decreased likelihood of Drug Toxicity when treated with efavirenz in people with HIV Infections as compared to genotype TT. | \n",
" 2.0 | \n",
"
\n",
" \n",
" 14701 | \n",
" 1451243980 | \n",
" 1448993810 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448993810 | \n",
" 26715213.0 | \n",
" Genotypes CC + CT are associated with decreased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections as compared to genotype TT. | \n",
" 3.5 | \n",
"
\n",
" \n",
" 14702 | \n",
" 1451243980 | \n",
" 827707534 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/827707534 | \n",
" 21862974.0 | \n",
" CYP2B6 *6/*6 is associated with increased risk of drug-induced liver injury when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. | \n",
" 2.5 | \n",
"
\n",
" \n",
" 14703 | \n",
" 1451243980 | \n",
" 1184168515 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1184168515 | \n",
" 23734829.0 | \n",
" CYP2B6 *1 is not associated with Neurotoxicity Syndromes when treated with efavirenz in people with HIV as compared to CYP2B6 *6. | \n",
" -1.5 | \n",
"
\n",
" \n",
" 14704 | \n",
" 1451243980 | \n",
" 1448993721 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448993721 | \n",
" 22808112.0 | \n",
" CYP2B6 *6 is associated with increased risk of Toxic liver disease when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. | \n",
" 2.25 | \n",
"
\n",
" \n",
" 14705 | \n",
" 1451243980 | \n",
" 1448993746 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448993746 | \n",
" 27333947.0 | \n",
" CYP2B6 *6/*6 is associated with increased risk of Long QT Syndrome when exposed to efavirenz in healthy individuals as compared to CYP2B6 *1/*1. | \n",
" 1.75 | \n",
"
\n",
" \n",
" 14706 | \n",
" 1451243980 | \n",
" 1448994067 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1448994067 | \n",
" 17686225.0 | \n",
" CYP2B6 *2/*2 is associated with increased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. | \n",
" 0.25 | \n",
"
\n",
" \n",
" 14707 | \n",
" 1451243980 | \n",
" 1449156721 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449156721 | \n",
" 23640958.0 | \n",
" CYP2B6 *6 + *38 are associated with increased risk of Neurotoxicity Syndromes when treated with efavirenz as compared to CYP2B6 *1/*1. | \n",
" 0.0 | \n",
"
\n",
" \n",
" 14708 | \n",
" 1451243980 | \n",
" 1449156770 | \n",
" Variant Phenotype Annotation | \n",
" https://www.pharmgkb.org/variantAnnotation/1449156770 | \n",
" 24359841.0 | \n",
" CYP2B6 *6/*6 is associated with increased likelihood of Toxic liver disease when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Evidence ID Evidence Type \\\n",
"14695 1451243980 PA166182603 Guideline Annotation \n",
"14696 1451243980 PA166182846 Guideline Annotation \n",
"14697 1451243980 1451289240 Variant Phenotype Annotation \n",
"14698 1451243980 1183634232 Variant Phenotype Annotation \n",
"14699 1451243980 1184473287 Variant Phenotype Annotation \n",
"14700 1451243980 1448636199 Variant Phenotype Annotation \n",
"14701 1451243980 1448993810 Variant Phenotype Annotation \n",
"14702 1451243980 827707534 Variant Phenotype Annotation \n",
"14703 1451243980 1184168515 Variant Phenotype Annotation \n",
"14704 1451243980 1448993721 Variant Phenotype Annotation \n",
"14705 1451243980 1448993746 Variant Phenotype Annotation \n",
"14706 1451243980 1448994067 Variant Phenotype Annotation \n",
"14707 1451243980 1449156721 Variant Phenotype Annotation \n",
"14708 1451243980 1449156770 Variant Phenotype Annotation \n",
"\n",
" Evidence URL PMID \\\n",
"14695 https://www.pharmgkb.org/guidelineAnnotation/PA166182603 NaN \n",
"14696 https://www.pharmgkb.org/guidelineAnnotation/PA166182846 NaN \n",
"14697 https://www.pharmgkb.org/variantAnnotation/1451289240 25889207.0 \n",
"14698 https://www.pharmgkb.org/variantAnnotation/1183634232 24080498.0 \n",
"14699 https://www.pharmgkb.org/variantAnnotation/1184473287 24517233.0 \n",
"14700 https://www.pharmgkb.org/variantAnnotation/1448636199 28692529.0 \n",
"14701 https://www.pharmgkb.org/variantAnnotation/1448993810 26715213.0 \n",
"14702 https://www.pharmgkb.org/variantAnnotation/827707534 21862974.0 \n",
"14703 https://www.pharmgkb.org/variantAnnotation/1184168515 23734829.0 \n",
"14704 https://www.pharmgkb.org/variantAnnotation/1448993721 22808112.0 \n",
"14705 https://www.pharmgkb.org/variantAnnotation/1448993746 27333947.0 \n",
"14706 https://www.pharmgkb.org/variantAnnotation/1448994067 17686225.0 \n",
"14707 https://www.pharmgkb.org/variantAnnotation/1449156721 23640958.0 \n",
"14708 https://www.pharmgkb.org/variantAnnotation/1449156770 24359841.0 \n",
"\n",
" Summary \\\n",
"14695 Annotation of CPIC Guideline for efavirenz and CYP2B6 \n",
"14696 Annotation of DPWG Guideline for efavirenz and CYP2B6 \n",
"14697 Allele C is not associated with increased likelihood of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections as compared to allele T. \n",
"14698 Genotypes CC + CT are not associated with risk of Neurotoxicity Syndromes when treated with efavirenz in people with HIV Infections as compared to genotype TT. \n",
"14699 Genotype TT is associated with increased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections. \n",
"14700 Genotype CC is associated with decreased likelihood of Drug Toxicity when treated with efavirenz in people with HIV Infections as compared to genotype TT. \n",
"14701 Genotypes CC + CT are associated with decreased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV Infections as compared to genotype TT. \n",
"14702 CYP2B6 *6/*6 is associated with increased risk of drug-induced liver injury when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. \n",
"14703 CYP2B6 *1 is not associated with Neurotoxicity Syndromes when treated with efavirenz in people with HIV as compared to CYP2B6 *6. \n",
"14704 CYP2B6 *6 is associated with increased risk of Toxic liver disease when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. \n",
"14705 CYP2B6 *6/*6 is associated with increased risk of Long QT Syndrome when exposed to efavirenz in healthy individuals as compared to CYP2B6 *1/*1. \n",
"14706 CYP2B6 *2/*2 is associated with increased risk of Central Nervous System Diseases when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. \n",
"14707 CYP2B6 *6 + *38 are associated with increased risk of Neurotoxicity Syndromes when treated with efavirenz as compared to CYP2B6 *1/*1. \n",
"14708 CYP2B6 *6/*6 is associated with increased likelihood of Toxic liver disease when treated with efavirenz in people with HIV as compared to CYP2B6 *1/*1. \n",
"\n",
" Score \n",
"14695 100 \n",
"14696 100 \n",
"14697 -1.5 \n",
"14698 -1.75 \n",
"14699 2.0 \n",
"14700 2.0 \n",
"14701 3.5 \n",
"14702 2.5 \n",
"14703 -1.5 \n",
"14704 2.25 \n",
"14705 1.75 \n",
"14706 0.25 \n",
"14707 0.0 \n",
"14708 2.0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"show_id(1451243980)"
]
},
{
"cell_type": "markdown",
"id": "48bb09fa",
"metadata": {},
"source": [
"### Example extraction\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"New data model extracted from PharmKGB clinical annotations download file:\n",
"* The trait in the evidence will be PharmGKB's “Phenotypes”\n",
"* The drug will be extracted from PharmGKB's “Drugs”\n",
"* The target will be the target associated with the variant, PharmGKB’s “Gene”\n",
"* Filter rows for those whose category is `Efficacy` and has associated `Phenotypes`"
]
},
{
"cell_type": "code",
"execution_count": 134,
"id": "cf8105dc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Clinical Annotation ID', 'Variant/Haplotypes', 'Gene',\n",
" 'Level of Evidence', 'Level Override', 'Level Modifiers', 'Score',\n",
" 'Phenotype Category', 'PMID Count', 'Evidence Count', 'Drug(s)',\n",
" 'Phenotype(s)', 'Latest History Date (YYYY-MM-DD)', 'URL',\n",
" 'Specialty Population'],\n",
" dtype='object')"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clinical_annotations.columns"
]
},
{
"cell_type": "code",
"execution_count": 139,
"id": "b7be01a7",
"metadata": {},
"outputs": [],
"source": [
"# Filter by efficacy\n",
"efficacy_annotations = clinical_annotations[clinical_annotations['Phenotype Category'] == 'Efficacy']"
]
},
{
"cell_type": "code",
"execution_count": 150,
"id": "4995b340",
"metadata": {},
"outputs": [],
"source": [
"# Keep relevant columns\n",
"efficacy_annotations = efficacy_annotations[\n",
" ['Clinical Annotation ID', 'Variant/Haplotypes', 'Gene',\n",
" 'Level of Evidence', 'Drug(s)', 'Phenotype(s)']]"
]
},
{
"cell_type": "code",
"execution_count": 162,
"id": "5c364dd6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1931"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(efficacy_annotations)"
]
},
{
"cell_type": "code",
"execution_count": 153,
"id": "471220b2",
"metadata": {},
"outputs": [],
"source": [
"# Join on alleles data\n",
"efficacy_with_alleles = efficacy_annotations.set_index('Clinical Annotation ID').join(clinical_alleles.set_index('Clinical Annotation ID'))"
]
},
{
"cell_type": "code",
"execution_count": 154,
"id": "51ebdc1b",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Variant/Haplotypes | \n",
" Gene | \n",
" Level of Evidence | \n",
" Drug(s) | \n",
" Phenotype(s) | \n",
" Genotype/Allele | \n",
" Annotation Text | \n",
" Allele Function | \n",
"
\n",
" \n",
" Clinical Annotation ID | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 613979021 | \n",
" rs1042714 | \n",
" ADRB2 | \n",
" 3 | \n",
" carvedilol | \n",
" Heart Failure | \n",
" CC | \n",
" Patients with the CC genotype and heart failure may have a poorer response to carvedilol treatment as compared to patients with the CG or GG genotype. Other genetic and clinical factors may also influence a patient's chance of response. | \n",
" NaN | \n",
"
\n",
" \n",
" 613979021 | \n",
" rs1042714 | \n",
" ADRB2 | \n",
" 3 | \n",
" carvedilol | \n",
" Heart Failure | \n",
" CG | \n",
" Patients with the CG genotype and heart failure may have a poorer response to carvedilol treatment as compared to patients with the GG genotype and a better response as compared to patients with the CC genotype. Patients with the CG genotype may still be at risk for non-response to carvedilol treatment based on their genotype. Other genetic and clinical factors may also influence a patient's chance of response. | \n",
" NaN | \n",
"
\n",
" \n",
" 613979021 | \n",
" rs1042714 | \n",
" ADRB2 | \n",
" 3 | \n",
" carvedilol | \n",
" Heart Failure | \n",
" GG | \n",
" Patients with the GG genotype and heart failure may have a better response to carvedilol treatment as compared to patients with the CC or CG genotype. Patients with the GG genotype may still be at risk for non-response to carvedilol treatment based on their genotype. Other genetic and clinical factors may also influence a patient's chance of response. | \n",
" NaN | \n",
"
\n",
" \n",
" 613979403 | \n",
" rs5443 | \n",
" GNB3 | \n",
" 3 | \n",
" sumatriptan | \n",
" Cluster Headache | \n",
" CC | \n",
" Patients with the CC genotype and cluster headache who are treated with triptans may be less likely to have reduced pain or attack frequency as compared to patients with the CT genotype. Other genetic and clinical factors may also influence a patient's response to sumatriptan. | \n",
" NaN | \n",
"
\n",
" \n",
" 613979403 | \n",
" rs5443 | \n",
" GNB3 | \n",
" 3 | \n",
" sumatriptan | \n",
" Cluster Headache | \n",
" CT | \n",
" Patients with the CT genotype and cluster headache who are treated with triptans may be more likely to have reduced pain or attack frequency as compared to patients with the CC genotype. Other genetic and clinical factors may also influence a patient's response to sumatriptan. | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1451868520 | \n",
" rs11198893 | \n",
" GRK5 | \n",
" 3 | \n",
" Beta Blocking Agents | \n",
" Coronary Artery Disease | \n",
" AG | \n",
" Patients with the rs11198893 AG genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. | \n",
" NaN | \n",
"
\n",
" \n",
" 1451868520 | \n",
" rs11198893 | \n",
" GRK5 | \n",
" 3 | \n",
" Beta Blocking Agents | \n",
" Coronary Artery Disease | \n",
" GG | \n",
" Patients with the rs11198893 GG genotype and coronary artery disease may have increased response when treated with beta blocking agents as compared to patients with the AA or AG genotypes. Other genetic and clinical factors may also influence response to beta blocking agents. | \n",
" NaN | \n",
"
\n",
" \n",
" 1451868540 | \n",
" rs4752292 | \n",
" GRK5 | \n",
" 3 | \n",
" Beta Blocking Agents | \n",
" Coronary Artery Disease | \n",
" GG | \n",
" Patients with the rs4752292 GG genotype and coronary artery disease may have increased response when treated with beta blocking agents as compared to patients with the TT or GT genotypes. Other genetic and clinical factors may also influence response to beta blocking agents. | \n",
" NaN | \n",
"
\n",
" \n",
" 1451868540 | \n",
" rs4752292 | \n",
" GRK5 | \n",
" 3 | \n",
" Beta Blocking Agents | \n",
" Coronary Artery Disease | \n",
" GT | \n",
" Patients with the rs4752292 GT genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. | \n",
" NaN | \n",
"
\n",
" \n",
" 1451868540 | \n",
" rs4752292 | \n",
" GRK5 | \n",
" 3 | \n",
" Beta Blocking Agents | \n",
" Coronary Artery Disease | \n",
" TT | \n",
" Patients with the rs4752292 TT genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
5881 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Variant/Haplotypes Gene Level of Evidence \\\n",
"Clinical Annotation ID \n",
"613979021 rs1042714 ADRB2 3 \n",
"613979021 rs1042714 ADRB2 3 \n",
"613979021 rs1042714 ADRB2 3 \n",
"613979403 rs5443 GNB3 3 \n",
"613979403 rs5443 GNB3 3 \n",
"... ... ... ... \n",
"1451868520 rs11198893 GRK5 3 \n",
"1451868520 rs11198893 GRK5 3 \n",
"1451868540 rs4752292 GRK5 3 \n",
"1451868540 rs4752292 GRK5 3 \n",
"1451868540 rs4752292 GRK5 3 \n",
"\n",
" Drug(s) Phenotype(s) \\\n",
"Clinical Annotation ID \n",
"613979021 carvedilol Heart Failure \n",
"613979021 carvedilol Heart Failure \n",
"613979021 carvedilol Heart Failure \n",
"613979403 sumatriptan Cluster Headache \n",
"613979403 sumatriptan Cluster Headache \n",
"... ... ... \n",
"1451868520 Beta Blocking Agents Coronary Artery Disease \n",
"1451868520 Beta Blocking Agents Coronary Artery Disease \n",
"1451868540 Beta Blocking Agents Coronary Artery Disease \n",
"1451868540 Beta Blocking Agents Coronary Artery Disease \n",
"1451868540 Beta Blocking Agents Coronary Artery Disease \n",
"\n",
" Genotype/Allele \\\n",
"Clinical Annotation ID \n",
"613979021 CC \n",
"613979021 CG \n",
"613979021 GG \n",
"613979403 CC \n",
"613979403 CT \n",
"... ... \n",
"1451868520 AG \n",
"1451868520 GG \n",
"1451868540 GG \n",
"1451868540 GT \n",
"1451868540 TT \n",
"\n",
" Annotation Text \\\n",
"Clinical Annotation ID \n",
"613979021 Patients with the CC genotype and heart failure may have a poorer response to carvedilol treatment as compared to patients with the CG or GG genotype. Other genetic and clinical factors may also influence a patient's chance of response. \n",
"613979021 Patients with the CG genotype and heart failure may have a poorer response to carvedilol treatment as compared to patients with the GG genotype and a better response as compared to patients with the CC genotype. Patients with the CG genotype may still be at risk for non-response to carvedilol treatment based on their genotype. Other genetic and clinical factors may also influence a patient's chance of response. \n",
"613979021 Patients with the GG genotype and heart failure may have a better response to carvedilol treatment as compared to patients with the CC or CG genotype. Patients with the GG genotype may still be at risk for non-response to carvedilol treatment based on their genotype. Other genetic and clinical factors may also influence a patient's chance of response. \n",
"613979403 Patients with the CC genotype and cluster headache who are treated with triptans may be less likely to have reduced pain or attack frequency as compared to patients with the CT genotype. Other genetic and clinical factors may also influence a patient's response to sumatriptan. \n",
"613979403 Patients with the CT genotype and cluster headache who are treated with triptans may be more likely to have reduced pain or attack frequency as compared to patients with the CC genotype. Other genetic and clinical factors may also influence a patient's response to sumatriptan. \n",
"... ... \n",
"1451868520 Patients with the rs11198893 AG genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. \n",
"1451868520 Patients with the rs11198893 GG genotype and coronary artery disease may have increased response when treated with beta blocking agents as compared to patients with the AA or AG genotypes. Other genetic and clinical factors may also influence response to beta blocking agents. \n",
"1451868540 Patients with the rs4752292 GG genotype and coronary artery disease may have increased response when treated with beta blocking agents as compared to patients with the TT or GT genotypes. Other genetic and clinical factors may also influence response to beta blocking agents. \n",
"1451868540 Patients with the rs4752292 GT genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. \n",
"1451868540 Patients with the rs4752292 TT genotype and coronary artery disease may have decreased response when treated with beta blocking agents as compared to patients with the GG genotype. Other genetic and clinical factors may also influence response to beta blocking agents. \n",
"\n",
" Allele Function \n",
"Clinical Annotation ID \n",
"613979021 NaN \n",
"613979021 NaN \n",
"613979021 NaN \n",
"613979403 NaN \n",
"613979403 NaN \n",
"... ... \n",
"1451868520 NaN \n",
"1451868520 NaN \n",
"1451868540 NaN \n",
"1451868540 NaN \n",
"1451868540 NaN \n",
"\n",
"[5881 rows x 8 columns]"
]
},
"execution_count": 154,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"efficacy_with_alleles"
]
},
{
"cell_type": "code",
"execution_count": 161,
"id": "853b2e89",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5881"
]
},
"execution_count": 161,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Number of alleles (as opposed to variants)\n",
"len(efficacy_with_alleles)"
]
},
{
"cell_type": "code",
"execution_count": 158,
"id": "2f11b4a9",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"126"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Number of entries with allele function\n",
"len(efficacy_with_alleles[pd.notna(efficacy_with_alleles['Allele Function'])])"
]
},
{
"cell_type": "code",
"execution_count": 160,
"id": "5c4242f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5659"
]
},
"execution_count": 160,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Number of entries with RS\n",
"len(efficacy_with_alleles[efficacy_with_alleles['Variant/Haplotypes'].str.contains('rs')])"
]
},
{
"cell_type": "markdown",
"id": "9549ee4f",
"metadata": {},
"source": [
"### Connecting with ClinVar\n",
"\n",
"[Top of page](#Table-of-contents)"
]
},
{
"cell_type": "code",
"execution_count": 203,
"id": "70fcc847",
"metadata": {},
"outputs": [],
"source": [
"import re"
]
},
{
"cell_type": "code",
"execution_count": 230,
"id": "170de42a",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Can use Clinical Annotation ID which should appear in xrefs\n",
"all_pgkb_ids = []\n",
"for raw_cvs_xml in iterate_cvs_from_xml(drug_xml):\n",
" if is_pgkb(raw_cvs_xml):\n",
" record = ClinVarRecord(find_mandatory_unique_element(raw_cvs_xml, 'ReferenceClinVarAssertion'))\n",
" if record.measure:\n",
" # this is the soundest approach\n",
" pgkb_ids = [\n",
" int(elem.attrib['ID']) \n",
" for elem in find_elements(record.measure.measure_xml, './XRef[@DB=\"PharmGKB Clinical Annotation\"]')\n",
" ]\n",
" if not pgkb_ids:\n",
" # this yields a lot of redundancy\n",
" pgkb_ids = [\n",
" int(re.split(r'[a-zA-Z]+', elem.attrib['ID'])[0])\n",
" for elem in find_elements(record.measure.measure_xml, './XRef[@DB=\"PharmGKB\"]')\n",
" ]\n",
" if not pgkb_ids:\n",
" # this is stupid - probably don't do this\n",
" pgkb_ids = [\n",
" int(elem.text.split('/')[-1])\n",
" for elem in find_elements(raw_cvs_xml, './ClinVarAssertion/ClinicalSignificance/Citation/URL')\n",
" ]\n",
" if not pgkb_ids:\n",
" pprint(raw_cvs_xml)\n",
" break\n",
" all_pgkb_ids.extend(pgkb_ids)"
]
},
{
"cell_type": "code",
"execution_count": 231,
"id": "4bbc17b5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2000"
]
},
"execution_count": 231,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(all_pgkb_ids)"
]
},
{
"cell_type": "code",
"execution_count": 232,
"id": "c9926163",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"167"
]
},
"execution_count": 232,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Cf. 401 records with PGKB submissions\n",
"len(set(all_pgkb_ids))"
]
},
{
"cell_type": "code",
"execution_count": 234,
"id": "39bcfdb5",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Clinical Annotation ID | \n",
" Variant/Haplotypes | \n",
" Gene | \n",
" Level of Evidence | \n",
" Level Override | \n",
" Level Modifiers | \n",
" Score | \n",
" Phenotype Category | \n",
" PMID Count | \n",
" Evidence Count | \n",
" Drug(s) | \n",
" Phenotype(s) | \n",
" Latest History Date (YYYY-MM-DD) | \n",
" URL | \n",
" Specialty Population | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 981755803 | \n",
" rs75527207 | \n",
" CFTR | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 234.875 | \n",
" Efficacy | \n",
" 28 | \n",
" 30 | \n",
" ivacaftor | \n",
" Cystic Fibrosis | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/981755803 | \n",
" Pediatric | \n",
"
\n",
" \n",
" 3 | \n",
" 1449191690 | \n",
" rs141033578 | \n",
" CFTR | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 200.000 | \n",
" Efficacy | \n",
" 1 | \n",
" 3 | \n",
" ivacaftor | \n",
" Cystic Fibrosis | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1449191690 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 1449191746 | \n",
" rs78769542 | \n",
" CFTR | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 200.000 | \n",
" Efficacy | \n",
" 1 | \n",
" 3 | \n",
" ivacaftor | \n",
" Cystic Fibrosis | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1449191746 | \n",
" NaN | \n",
"
\n",
" \n",
" 27 | \n",
" 655386913 | \n",
" CYP2C19*1, CYP2C19*17 | \n",
" CYP2C19 | \n",
" 3 | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 6.000 | \n",
" Toxicity | \n",
" 15 | \n",
" 16 | \n",
" clopidogrel | \n",
" Acute coronary syndrome;Coronary Artery Disease;Hemorrhage;Myocardial Infarction | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/655386913 | \n",
" NaN | \n",
"
\n",
" \n",
" 159 | \n",
" 981201854 | \n",
" rs28399499 | \n",
" CYP2B6 | \n",
" 3 | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 5.250 | \n",
" Metabolism/PK | \n",
" 7 | \n",
" 7 | \n",
" nevirapine | \n",
" HIV Infections | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/981201854 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 4531 | \n",
" 1451237940 | \n",
" rs9923231 | \n",
" VKORC1 | \n",
" 1A | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 117.000 | \n",
" Dosage | \n",
" 10 | \n",
" 11 | \n",
" phenprocoumon | \n",
" NaN | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1451237940 | \n",
" Pediatric | \n",
"
\n",
" \n",
" 4533 | \n",
" 1451243676 | \n",
" rs9923231 | \n",
" VKORC1 | \n",
" 2A | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 8.250 | \n",
" Toxicity | \n",
" 3 | \n",
" 4 | \n",
" phenprocoumon | \n",
" Hemorrhage;over-anticoagulation;time above therapeutic range | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1451243676 | \n",
" NaN | \n",
"
\n",
" \n",
" 4535 | \n",
" 1451245360 | \n",
" rs1051266 | \n",
" SLC19A1 | \n",
" 2A | \n",
" NaN | \n",
" Tier 1 VIP | \n",
" 14.125 | \n",
" Efficacy | \n",
" 9 | \n",
" 10 | \n",
" methotrexate | \n",
" Arthritis, Rheumatoid | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1451245360 | \n",
" NaN | \n",
"
\n",
" \n",
" 4762 | \n",
" 1449191758 | \n",
" rs75541969 | \n",
" CFTR | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 200.000 | \n",
" Efficacy | \n",
" 1 | \n",
" 3 | \n",
" ivacaftor | \n",
" Cystic Fibrosis | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1449191758 | \n",
" NaN | \n",
"
\n",
" \n",
" 5001 | \n",
" 1451289660 | \n",
" rs59086055 | \n",
" DPYD | \n",
" 1A | \n",
" NaN | \n",
" Rare Variant; Tier 1 VIP | \n",
" 100.000 | \n",
" Toxicity | \n",
" 1 | \n",
" 2 | \n",
" fluorouracil | \n",
" Neoplasms | \n",
" 2021-03-24 | \n",
" https://www.pharmgkb.org/clinicalAnnotation/1451289660 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
161 rows × 15 columns
\n",
"
"
],
"text/plain": [
" Clinical Annotation ID Variant/Haplotypes Gene \\\n",
"0 981755803 rs75527207 CFTR \n",
"3 1449191690 rs141033578 CFTR \n",
"4 1449191746 rs78769542 CFTR \n",
"27 655386913 CYP2C19*1, CYP2C19*17 CYP2C19 \n",
"159 981201854 rs28399499 CYP2B6 \n",
"... ... ... ... \n",
"4531 1451237940 rs9923231 VKORC1 \n",
"4533 1451243676 rs9923231 VKORC1 \n",
"4535 1451245360 rs1051266 SLC19A1 \n",
"4762 1449191758 rs75541969 CFTR \n",
"5001 1451289660 rs59086055 DPYD \n",
"\n",
" Level of Evidence Level Override Level Modifiers Score \\\n",
"0 1A NaN Rare Variant; Tier 1 VIP 234.875 \n",
"3 1A NaN Rare Variant; Tier 1 VIP 200.000 \n",
"4 1A NaN Rare Variant; Tier 1 VIP 200.000 \n",
"27 3 NaN Tier 1 VIP 6.000 \n",
"159 3 NaN Tier 1 VIP 5.250 \n",
"... ... ... ... ... \n",
"4531 1A NaN Tier 1 VIP 117.000 \n",
"4533 2A NaN Tier 1 VIP 8.250 \n",
"4535 2A NaN Tier 1 VIP 14.125 \n",
"4762 1A NaN Rare Variant; Tier 1 VIP 200.000 \n",
"5001 1A NaN Rare Variant; Tier 1 VIP 100.000 \n",
"\n",
" Phenotype Category PMID Count Evidence Count Drug(s) \\\n",
"0 Efficacy 28 30 ivacaftor \n",
"3 Efficacy 1 3 ivacaftor \n",
"4 Efficacy 1 3 ivacaftor \n",
"27 Toxicity 15 16 clopidogrel \n",
"159 Metabolism/PK 7 7 nevirapine \n",
"... ... ... ... ... \n",
"4531 Dosage 10 11 phenprocoumon \n",
"4533 Toxicity 3 4 phenprocoumon \n",
"4535 Efficacy 9 10 methotrexate \n",
"4762 Efficacy 1 3 ivacaftor \n",
"5001 Toxicity 1 2 fluorouracil \n",
"\n",
" Phenotype(s) \\\n",
"0 Cystic Fibrosis \n",
"3 Cystic Fibrosis \n",
"4 Cystic Fibrosis \n",
"27 Acute coronary syndrome;Coronary Artery Disease;Hemorrhage;Myocardial Infarction \n",
"159 HIV Infections \n",
"... ... \n",
"4531 NaN \n",
"4533 Hemorrhage;over-anticoagulation;time above therapeutic range \n",
"4535 Arthritis, Rheumatoid \n",
"4762 Cystic Fibrosis \n",
"5001 Neoplasms \n",
"\n",
" Latest History Date (YYYY-MM-DD) \\\n",
"0 2021-03-24 \n",
"3 2021-03-24 \n",
"4 2021-03-24 \n",
"27 2021-03-24 \n",
"159 2021-03-24 \n",
"... ... \n",
"4531 2021-03-24 \n",
"4533 2021-03-24 \n",
"4535 2021-03-24 \n",
"4762 2021-03-24 \n",
"5001 2021-03-24 \n",
"\n",
" URL \\\n",
"0 https://www.pharmgkb.org/clinicalAnnotation/981755803 \n",
"3 https://www.pharmgkb.org/clinicalAnnotation/1449191690 \n",
"4 https://www.pharmgkb.org/clinicalAnnotation/1449191746 \n",
"27 https://www.pharmgkb.org/clinicalAnnotation/655386913 \n",
"159 https://www.pharmgkb.org/clinicalAnnotation/981201854 \n",
"... ... \n",
"4531 https://www.pharmgkb.org/clinicalAnnotation/1451237940 \n",
"4533 https://www.pharmgkb.org/clinicalAnnotation/1451243676 \n",
"4535 https://www.pharmgkb.org/clinicalAnnotation/1451245360 \n",
"4762 https://www.pharmgkb.org/clinicalAnnotation/1449191758 \n",
"5001 https://www.pharmgkb.org/clinicalAnnotation/1451289660 \n",
"\n",
" Specialty Population \n",
"0 Pediatric \n",
"3 NaN \n",
"4 NaN \n",
"27 NaN \n",
"159 NaN \n",
"... ... \n",
"4531 Pediatric \n",
"4533 NaN \n",
"4535 NaN \n",
"4762 NaN \n",
"5001 NaN \n",
"\n",
"[161 rows x 15 columns]"
]
},
"execution_count": 234,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clinical_annotations[clinical_annotations['Clinical Annotation ID'].isin(set(all_pgkb_ids))]"
]
},
{
"cell_type": "markdown",
"id": "e0495ff2",
"metadata": {},
"source": [
"### Star alleles\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"e.g. [CYP2D6](https://www.ncbi.nlm.nih.gov/books/NBK574601/) - corresponds to\n",
"> specific combinations of single nucleotide polymorphisms (SNPs) and/or small insertions and deletions (indels).... In addition, the CYP2D6 gene locus contains a number of complex structural variants including full gene deletions, gene duplications and multiplications [[via](https://www.nature.com/articles/s41525-020-0135-2)]\n",
"\n",
"`CYP2D6*1` is the reference allele, `CYP2D6*(gene variant)XN`, refers to `N` copies of the gene.\n",
"\n",
"Nomenclature is really heterogeneous, compare [HLA](http://hla.alleles.org/nomenclature/naming.html) - there are lots of rabbit holes we could go down!!\n",
"\n",
"Conversion to rs / hgvs? e.g. in [PharmVar](https://www.pharmvar.org/gene/CYP2D6)\n",
"* has [data download](https://www.pharmvar.org/download)\n",
"* also has an [API](https://www.pharmvar.org/documentation)!"
]
},
{
"cell_type": "code",
"execution_count": 170,
"id": "35cef091",
"metadata": {},
"outputs": [],
"source": [
"no_rs = efficacy_with_alleles[~efficacy_with_alleles['Variant/Haplotypes'].str.contains('rs')]['Variant/Haplotypes'].tolist()"
]
},
{
"cell_type": "code",
"execution_count": 238,
"id": "7ea17e44",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'CYP2B6*1, CYP2B6*4, CYP2B6*5, CYP2B6*6, CYP2B6*7',\n",
" 'CYP2B6*1, CYP2B6*5',\n",
" 'CYP2B6*1, CYP2B6*6',\n",
" 'CYP2C19*1, CYP2C19*2',\n",
" 'CYP2C19*1, CYP2C19*2, CYP2C19*3',\n",
" 'CYP2C19*1, CYP2C19*2, CYP2C19*3, CYP2C19*17',\n",
" 'CYP2C8*1, CYP2C8*2, CYP2C8*3, CYP2C8*4',\n",
" 'CYP2C8*1, CYP2C8*3',\n",
" 'CYP2C9*1, CYP2C9*2, CYP2C9*3',\n",
" 'CYP2C9*1, CYP2C9*2, CYP2C9*3, CYP2C9*13, CYP2C9*14',\n",
" 'CYP2C9*1, CYP2C9*3',\n",
" 'CYP2D6*1, CYP2D6*10',\n",
" 'CYP2D6*1, CYP2D6*1xN',\n",
" 'CYP2D6*1, CYP2D6*1xN, CYP2D6*2, CYP2D6*2xN, CYP2D6*3, CYP2D6*4, CYP2D6*6',\n",
" 'CYP2D6*1, CYP2D6*1xN, CYP2D6*2, CYP2D6*2xN, CYP2D6*4, CYP2D6*5, CYP2D6*10, CYP2D6*35xN',\n",
" 'CYP2D6*1, CYP2D6*1xN, CYP2D6*2xN',\n",
" 'CYP2D6*1, CYP2D6*2, CYP2D6*2xN, CYP2D6*3, CYP2D6*4, CYP2D6*6',\n",
" 'CYP2D6*1, CYP2D6*2, CYP2D6*3, CYP2D6*4, CYP2D6*5, CYP2D6*6, CYP2D6*7, CYP2D6*9, CYP2D6*10, CYP2D6*10x2, CYP2D6*11, CYP2D6*17, CYP2D6*21, CYP2D6*36, CYP2D6*41',\n",
" 'CYP2D6*1, CYP2D6*3, CYP2D6*4',\n",
" 'CYP2D6*1, CYP2D6*3, CYP2D6*4, CYP2D6*5, CYP2D6*6, CYP2D6*10, CYP2D6*17',\n",
" 'CYP2D6*1, CYP2D6*4',\n",
" 'CYP2D6*1, CYP2D6*4, CYP2D6*5, CYP2D6*6, CYP2D6*17, CYP2D6*40',\n",
" 'CYP2D6*5, CYP2D6*17',\n",
" 'CYP3A4*1, CYP3A4*22',\n",
" 'CYP3A4*1, CYP3A4*36',\n",
" 'CYP3A4*1, CYP3A4*4',\n",
" 'CYP3A5*1, CYP3A5*3',\n",
" 'GSTM1 non-null, GSTM1 null',\n",
" 'GSTT1 non-null, GSTT1 null',\n",
" 'HLA-B*15:01:01:01',\n",
" 'HLA-B*38:01:01',\n",
" 'HLA-B*44:02:01:01',\n",
" 'HLA-C*01:02:01, HLA-C*02:02:01, HLA-C*03:02, HLA-C*04:01:01:01, HLA-C*05:01:01:01, HLA-C*06:02:01:01, HLA-C*07:01:01, HLA-C*08:01, HLA-C*12:02:01, HLA-C*14:02:01, HLA-C*15:02:01, HLA-C*16:01:01, HLA-C*17:01:01:01',\n",
" 'HLA-C*06:02:01:01',\n",
" 'HLA-DRB1*04:01:01',\n",
" 'NAT2*4, NAT2*5D, NAT2*6B, NAT2*7A, NAT2*12A, NAT2*13A, NAT2*14A',\n",
" 'SLC6A4 HTTLPR long form (L allele), SLC6A4 HTTLPR short form (S allele)',\n",
" 'SLCO1B1*1, SLCO1B1*14',\n",
" 'TPMT*1, TPMT*3B, TPMT*3C',\n",
" 'UGT1A1*1, UGT1A1*28',\n",
" 'UGT1A1*60',\n",
" 'UGT1A3*1, UGT1A3*2',\n",
" 'UGT2B15*1, UGT2B15*2'}"
]
},
"execution_count": 238,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(no_rs)"
]
},
{
"cell_type": "code",
"execution_count": 172,
"id": "67627cc5",
"metadata": {},
"outputs": [],
"source": [
"import requests"
]
},
{
"cell_type": "code",
"execution_count": 173,
"id": "63d9cbdb",
"metadata": {},
"outputs": [],
"source": [
"def get_pharmvar_result(allele):\n",
" return requests.get(f'https://www.pharmvar.org/api-service/alleles/{allele}').json()"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "3281a20d",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"[{'geneSymbol': 'CYP2C9',\n",
" 'alleleName': 'CYP2C9*2',\n",
" 'pvId': 'PV00538',\n",
" 'legacyLabel': None,\n",
" 'coreAllele': None,\n",
" 'evidenceLevel': '0',\n",
" 'description': None,\n",
" 'function': 'decreased function',\n",
" 'activeInd': True,\n",
" 'references': [{'citation': 'Rettie et al. 1994',\n",
" 'url': 'http://www.ncbi.nlm.nih.gov/pubmed/8004131'},\n",
" {'citation': 'Crespi et al. 1997',\n",
" 'url': 'http://www.ncbi.nlm.nih.gov/pubmed/9241660'},\n",
" {'citation': 'deposited by Gaedigk et al.', 'url': None},\n",
" {'citation': 'King et al. 2004',\n",
" 'url': 'http://www.ncbi.nlm.nih.gov/pubmed/15608560'},\n",
" {'citation': 'Takahashi et al. 2004',\n",
" 'url': 'http://www.ncbi.nlm.nih.gov/pubmed/15070684'},\n",
" {'citation': 'deposited by Campos et al.', 'url': None}],\n",
" 'variants': [{'referenceSequence': 'NC_000010.11',\n",
" 'referenceLocation': 'Sequence Start',\n",
" 'referenceCollections': ['GRCh38'],\n",
" 'hgvs': 'NC_000010.11:g.94942290C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/29',\n",
" 'variantId': '8',\n",
" 'position': 'NC_000010.11:g.94942290C>T'},\n",
" {'referenceSequence': 'NC_000010.10',\n",
" 'referenceLocation': 'Sequence Start',\n",
" 'referenceCollections': ['GRCh37'],\n",
" 'hgvs': 'NC_000010.10:g.96702047C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/31',\n",
" 'variantId': '8',\n",
" 'position': 'NC_000010.10:g.96702047C>T'},\n",
" {'referenceSequence': 'NM_000771.4',\n",
" 'referenceLocation': 'Sequence Start',\n",
" 'referenceCollections': ['RefSeqTranscript'],\n",
" 'hgvs': 'NM_000771.4:c.430C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/13748',\n",
" 'variantId': '8',\n",
" 'position': 'NM_000771.4:c.455C>T'},\n",
" {'referenceSequence': 'NM_000771.4',\n",
" 'referenceLocation': 'ATG Start',\n",
" 'referenceCollections': ['RefSeqTranscript'],\n",
" 'hgvs': 'NM_000771.4:c.430C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/13747',\n",
" 'variantId': '8',\n",
" 'position': 'NM_000771.4:c.430C>T'},\n",
" {'referenceSequence': 'NG_008385.2',\n",
" 'referenceLocation': 'ATG Start',\n",
" 'referenceCollections': ['RefSeqGene'],\n",
" 'hgvs': 'NG_008385.2:g.9133C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/13590',\n",
" 'variantId': '8',\n",
" 'position': 'NG_008385.2:g.3608C>T'},\n",
" {'referenceSequence': 'NG_008385.2',\n",
" 'referenceLocation': 'Sequence Start',\n",
" 'referenceCollections': ['RefSeqGene'],\n",
" 'hgvs': 'NG_008385.2:g.9133C>T',\n",
" 'rsId': 'rs1799853',\n",
" 'impact': 'R144C',\n",
" 'variantFrequency': [{'source': '1000Genomes', 'frequency': 0.047923},\n",
" {'source': 'GnomAD', 'frequency': 0.092016}],\n",
" 'url': 'https://www.pharmvar.org/variant/13589',\n",
" 'variantId': '8',\n",
" 'position': 'NG_008385.2:g.9133C>T'}],\n",
" 'alleleType': 'Core',\n",
" 'url': 'https://www.pharmvar.org/haplotype/PV00538',\n",
" 'hgvs': 'NG_008385.2:g.9133C>T',\n",
" 'variantGroups': []}]"
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_pharmvar_result('CYP2C9*2')"
]
},
{
"cell_type": "code",
"execution_count": 178,
"id": "43987077",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'errorMessage': 'Allele NAT2*6 could not be located in the PharmVar database.',\n",
" 'errorCode': 404}"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_pharmvar_result('NAT2*6')"
]
},
{
"cell_type": "markdown",
"id": "e3ee8cc6",
"metadata": {},
"source": [
"### Notes\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"* More data than submitted to ClinVar\n",
" * only top 2 tiers of evidence are submitted, most data is in the 3rd\n",
"* Data is richer than ClinVar, but a fair amount of it is buried in free text annotations\n",
" * in particular direction of effect\n",
"* Can connect with ClinVar RCVs via their internal identifiers\n",
"* Most data seems to use RS IDs\n",
" * in theory get consequences via alleles data (assuming we can get reference allele I guess)\n",
"* Pharmacogenes with star alleles are few but important\n",
" * will need some special treatment and possibly use of more resources like PharmVar\n",
" * maybe parallels with how we handle other complex events in ClinVar"
]
},
{
"cell_type": "markdown",
"id": "a90d7f83",
"metadata": {},
"source": [
"## General\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"Thinking both about PharmGKB data and the more general question of other data sources. Options:\n",
"\n",
"* Add a new data source pipeline\n",
" * most likely more data even from submitters to ClinVar\n",
" * can also generalise to sources that don't submit to ClinVar at all\n",
" * can be used as additional annotations to ClinVar or entirely separate submissions\n",
" * probably more work for us\n",
"* Start parsing submitted records in ClinVar\n",
" * beneficial if it's common that SCVs have more info than in RCV\n",
" * potentially can get data from multiple upstream sources with a single SCV parser\n",
" * lends itself to enriching \"core\" ClinVar data - ClinVar takes care of linkage\n",
" * potential for extra/duplicate work aggregating submissions to ClinVar\n"
]
},
{
"cell_type": "markdown",
"id": "0ebd2fcc",
"metadata": {},
"source": [
"### Questions for 29/9 meeting\n",
"\n",
"* Should we start to parse SCVs in ClinVar?\n",
"* Is it worth trying other ways of linking drug & disease within ClinVar?\n",
"* What would be useful to get directly from PharmGKB (besides just more data)?\n",
"* Any familiarity with Pharmacogenes, star alleles and other nomenclature\n",
"* Other questions you have, other info that would be helpful for decision making"
]
},
{
"cell_type": "markdown",
"id": "60dc896e",
"metadata": {},
"source": [
"### Meeting notes\n",
"\n",
"[Top of page](#Table-of-contents)\n",
"\n",
"* existance of drugResponse field changes the meaning of disease from source - check OT is ok with this\n",
"* maybe this is why CV doesn't include disease traits in RCV - can't confidently associate the variant with the disease, only the drug response\n",
"* disease traits are potentially more ambiguous - free text, not annotated by CV with xrefs\n",
" * probably extra manual curation for us\n",
"* are there other terms for efficacy we can consider - depends on how efficacy is measured\n",
"* same question as for clinvar - if drug & disease occur in same record, does it mean the drug is specifically targetting that disease\n",
"* other things to highlight - really low number of exact efficacy terms, can provide evidence levels from pharmgkb\n",
"* next steps - basically investigation into PharmGKB and/or SCV, but pending some questions for OT to raise at next meeting"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}