--- name: "drugbank-database-access" description: "Parse local DrugBank XML for drug info, interactions, targets, and properties. Search by ID/name/CAS, extract DDIs with severity, map targets/enzymes/transporters, compute SMILES similarity. Primary via local XML; REST API rate-limited (3k/month dev). For live bioactivity use chembl-database-bioactivity; for compound properties use pubchem-compound-search." license: "Unknown" --- # DrugBank Database — Local XML Access ## Overview Query the DrugBank comprehensive drug database (14,000+ drug entries, 5,000+ protein targets, 17,000+ drug interactions) by parsing the locally downloaded XML file with Python's ElementTree. Covers drug lookups, interaction checking, target/pathway extraction, chemical property analysis, and cross-database identifier mapping. ## When to Use - Looking up drug information (description, indication, mechanism, pharmacology) by DrugBank ID, name, or CAS number - Checking drug-drug interactions and severity classifications for polypharmacy safety - Extracting drug targets, enzymes, transporters, and carriers with UniProt accessions - Retrieving chemical properties (SMILES, InChI, molecular weight) for cheminformatics analysis - Mapping DrugBank entries to external databases (PubChem, ChEMBL, UniProt, KEGG) - Building drug similarity matrices from molecular fingerprints - For live bioactivity data (IC50, Ki, EC50) use `chembl-database-bioactivity` instead - For compound property lookups without downloading a database use `pubchem-compound-search` instead ## Prerequisites - **DrugBank account**: Register at https://go.drugbank.com/ (free academic license) - **XML download**: Download `drugbank_all_full_database.xml.zip` after registration (~1.5 GB uncompressed) - **Python packages**: `lxml`, `rdkit` (similarity), `pandas` (tabular analysis) - **REST API** (optional): 3,000 req/month dev tier; use local XML for batch work ```bash pip install lxml pandas pip install rdkit-pypi # chemical similarity pip install drugbank-downloader # programmatic XML download ``` ## Quick Start ```python import xml.etree.ElementTree as ET NS = {'db': 'http://www.drugbank.ca'} # Required for ALL XPath queries tree = ET.parse('drugbank_all_full_database.xml') # 30-60s for full XML root = tree.getroot() # Build lookup index (DrugBank ID + lowercase name → element) drug_index = {} for drug in root.findall('db:drug', NS): db_id = drug.find('db:drugbank-id[@primary="true"]', NS) name = drug.find('db:name', NS) if db_id is not None and name is not None: drug_index[db_id.text] = drug drug_index[name.text.lower()] = drug def find_drug(query): """Find drug by DrugBank ID, name (case-insensitive), or CAS number.""" result = drug_index.get(query) or drug_index.get(query.lower()) if result is not None: return result for drug in root.findall('db:drug', NS): # CAS fallback cas = drug.find('db:cas-number', NS) if cas is not None and cas.text == query: return drug return None drug = find_drug('DB00945') # Aspirin name = drug.find('db:name', NS).text print(f"{name}: {drug.find('db:description', NS).text[:100]}...") ``` ## Core API ### 1. Data Access and Setup ```python import xml.etree.ElementTree as ET NS = {'db': 'http://www.drugbank.ca'} tree = ET.parse('drugbank_all_full_database.xml') root = tree.getroot() print(f"Total drug entries: {len(root.findall('db:drug', NS))}") ``` For memory-constrained environments, use iterparse: ```python drug_names = {} for event, elem in ET.iterparse('drugbank_all_full_database.xml', events=('end',)): if elem.tag == '{http://www.drugbank.ca}drug': db_id = elem.find('{http://www.drugbank.ca}drugbank-id[@primary="true"]') name = elem.find('{http://www.drugbank.ca}name') if db_id is not None and name is not None: drug_names[db_id.text] = name.text elem.clear() # Free memory print(f"Parsed {len(drug_names)} drugs via iterparse") ``` ### 2. Drug Information Queries ```python def get_drug_info(drug_element): """Extract comprehensive drug information.""" def txt(path): el = drug_element.find(path, NS) return el.text if el is not None and el.text else None return { 'drugbank_id': txt('db:drugbank-id[@primary="true"]'), 'name': txt('db:name'), 'type': drug_element.get('type'), 'description': txt('db:description'), 'indication': txt('db:indication'), 'mechanism_of_action': txt('db:mechanism-of-action'), 'cas_number': txt('db:cas-number'), 'groups': [g.text for g in drug_element.findall('db:groups/db:group', NS)], } info = get_drug_info(find_drug('Metformin')) print(f"{info['name']} ({info['type']}): Groups={info['groups']}") ``` ```python # Search by name pattern (partial match) def search_by_name(pattern): pattern_lower = pattern.lower() return [d for d in root.findall('db:drug', NS) if d.find('db:name', NS) is not None and pattern_lower in d.find('db:name', NS).text.lower()] statins = search_by_name('statin') print(f"Found {len(statins)} drugs matching 'statin'") ``` ### 3. Drug-Drug Interactions ```python def get_interactions(drug_element): """Extract all drug-drug interactions.""" return [{ 'drugbank_id': i.find('db:drugbank-id', NS).text, 'name': i.find('db:name', NS).text, 'description': i.find('db:description', NS).text, } for i in drug_element.findall('db:drug-interactions/db:drug-interaction', NS)] def classify_severity(description): """Classify severity from interaction description text.""" if not description: return 'unknown' dl = description.lower() if any(w in dl for w in ['contraindicated', 'avoid', 'fatal', 'life-threatening']): return 'major' if any(w in dl for w in ['increase', 'decrease', 'enhance', 'reduce', 'alter']): return 'moderate' return 'minor' interactions = get_interactions(find_drug('Aspirin')) print(f"Aspirin has {len(interactions)} interactions") for i in interactions[:3]: print(f" [{classify_severity(i['description'])}] {i['name']}") ``` ```python # Check pairwise interaction between two drugs def check_interaction(drug1_elem, drug2_elem): id2 = drug2_elem.find('db:drugbank-id[@primary="true"]', NS).text for inter in get_interactions(drug1_elem): if inter['drugbank_id'] == id2: inter['severity'] = classify_severity(inter['description']) return inter return None result = check_interaction(find_drug('Warfarin'), find_drug('Aspirin')) if result: print(f"[{result['severity']}] {result['description'][:150]}") ``` ### 4. Drug Targets and Pathways ```python def get_targets(drug_element, target_type='targets'): """Extract targets/enzymes/transporters/carriers. target_type: 'targets', 'enzymes', 'transporters', or 'carriers' """ results = [] for target in drug_element.findall(f'db:{target_type}/db:{target_type[:-1]}', NS): t = { 'name': (target.find('db:name', NS).text if target.find('db:name', NS) is not None else None), 'actions': [a.text for a in target.findall('db:actions/db:action', NS) if a.text], } poly = target.find('db:polypeptide', NS) if poly is not None: t['uniprot_id'] = poly.get('id') gene = poly.find('db:gene-name', NS) t['gene_name'] = gene.text if gene is not None else None results.append(t) return results drug = find_drug('Imatinib') targets = get_targets(drug, 'targets') enzymes = get_targets(drug, 'enzymes') print(f"Imatinib — Targets: {len(targets)}, Enzymes: {len(enzymes)}") for t in targets[:3]: print(f" {t['name']} (UniProt: {t.get('uniprot_id', 'N/A')}) — {t['actions']}") ``` ```python def get_pathways(drug_element): """Extract SMPDB pathway associations.""" pathways = [] for pw in drug_element.findall('db:pathways/db:pathway', NS): name = pw.find('db:name', NS) smpdb = pw.find('db:smpdb-id', NS) pathways.append({ 'smpdb_id': smpdb.text if smpdb is not None else None, 'name': name.text if name is not None else None, }) return pathways for pw in get_pathways(find_drug('Metformin')): print(f" {pw['smpdb_id']}: {pw['name']}") ``` ### 5. Chemical Properties and Similarity ```python def get_property(drug_element, kind_name, section='calculated'): """Get a single property value by kind name.""" prefix = f'db:{section}-properties/db:property' for prop in drug_element.findall(prefix, NS): kind = prop.find('db:kind', NS) if kind is not None and kind.text == kind_name: return prop.find('db:value', NS).text return None def get_all_properties(drug_element): """Extract all calculated and experimental properties as a dict.""" props = {} for section in ('calculated', 'experimental'): for prop in drug_element.findall(f'db:{section}-properties/db:property', NS): kind = prop.find('db:kind', NS) value = prop.find('db:value', NS) if kind is not None and value is not None: key = f'{section}_{kind.text}' if section == 'experimental' else kind.text props[key] = value.text return props drug = find_drug('Aspirin') print(f"SMILES: {get_property(drug, 'SMILES')}") print(f"MW: {get_property(drug, 'Molecular Weight')}") print(f"LogP: {get_property(drug, 'logP')}") ``` ```python # Tanimoto similarity between drugs using RDKit Morgan fingerprints from rdkit import Chem from rdkit.Chem import AllChem, DataStructs def drug_similarity(drug1_elem, drug2_elem, radius=2, nbits=2048): smi1, smi2 = get_property(drug1_elem, 'SMILES'), get_property(drug2_elem, 'SMILES') if not smi1 or not smi2: return None mol1, mol2 = Chem.MolFromSmiles(smi1), Chem.MolFromSmiles(smi2) if mol1 is None or mol2 is None: return None fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, radius, nBits=nbits) fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, radius, nBits=nbits) return DataStructs.TanimotoSimilarity(fp1, fp2) sim = drug_similarity(find_drug('Aspirin'), find_drug('Ibuprofen')) print(f"Aspirin vs Ibuprofen: {sim:.3f}") ``` ### 6. Cross-Database Integration ```python def get_external_ids(drug_element): """Extract all external database identifiers.""" ids = {} for ident in drug_element.findall('db:external-identifiers/db:external-identifier', NS): resource = ident.find('db:resource', NS) identifier = ident.find('db:identifier', NS) if resource is not None and identifier is not None: ids[resource.text] = identifier.text return ids ids = get_external_ids(find_drug('Imatinib')) print(f"PubChem: {ids.get('PubChem Compound')}, ChEMBL: {ids.get('ChEMBL')}, " f"KEGG: {ids.get('KEGG Drug')}, UniProt: {ids.get('UniProtKB')}") ``` ```python # Build cross-reference table for multiple drugs import pandas as pd def build_crossref_table(names): rows = [] for name in names: d = find_drug(name) if d is None: continue ids = get_external_ids(d) rows.append({'drug': name, 'drugbank_id': d.find('db:drugbank-id[@primary="true"]', NS).text, 'pubchem': ids.get('PubChem Compound'), 'chembl': ids.get('ChEMBL'), 'kegg': ids.get('KEGG Drug')}) return pd.DataFrame(rows) print(build_crossref_table(['Aspirin', 'Metformin', 'Imatinib', 'Warfarin']).to_string(index=False)) ``` ## Key Concepts ### XML Namespace Handling All DrugBank XML queries require the namespace prefix. Without it, XPath returns no results. ```python NS = {'db': 'http://www.drugbank.ca'} name = drug.find('db:name', NS).text # CORRECT name = drug.find('name') # WRONG — returns None! # For iterparse, use full URI: '{http://www.drugbank.ca}drug' ``` ### Drug Entry Structure | Section | XPath | Content | |---------|-------|---------| | Identity | `db:drugbank-id`, `db:name`, `db:cas-number` | Primary identifiers | | Pharmacology | `db:description`, `db:indication`, `db:mechanism-of-action` | Clinical text, mechanism, PD/PK | | Interactions | `db:drug-interactions/db:drug-interaction` | Interacting drugs with descriptions | | Targets | `db:targets/db:target` | Protein targets with actions | | Enzymes/Transporters/Carriers | `db:enzymes/db:enzyme`, etc. | CYP450, P-gp, binding proteins | | Pathways | `db:pathways/db:pathway` | SMPDB pathway associations | | Properties | `db:calculated-properties`, `db:experimental-properties` | SMILES, MW, logP, etc. | | External IDs | `db:external-identifiers` | PubChem, ChEMBL, KEGG, UniProt cross-refs | ### External Identifier Mapping | Resource Name in XML | Database | Example | |----------------------|----------|---------| | `PubChem Compound` | PubChem CID | `2244` | | `ChEMBL` | ChEMBL | `CHEMBL25` | | `KEGG Drug` / `KEGG Compound` | KEGG | `D00109` / `C01405` | | `UniProtKB` | UniProt | `P23219` | | `PharmGKB` | PharmGKB | `PA452615` | | `ChEBI` | ChEBI | `15365` | ### Calculated Property Kinds Common `kind` values: `SMILES`, `InChI`, `InChIKey`, `Molecular Weight`, `Molecular Formula`, `logP`, `logS`, `Polar Surface Area (PSA)`, `Rotatable Bond Count`, `H Bond Acceptor Count`, `H Bond Donor Count`, `pKa (strongest acidic)`, `pKa (strongest basic)`, `Rule of Five`, `Bioavailability`. ## Common Workflows ### Workflow 1: Drug Discovery Target Analysis **Goal**: Find all drugs targeting a specific gene and analyze their properties. ```python import xml.etree.ElementTree as ET import pandas as pd NS = {'db': 'http://www.drugbank.ca'} tree = ET.parse('drugbank_all_full_database.xml') root = tree.getroot() target_gene = 'EGFR' records = [] for drug in root.findall('db:drug', NS): for target in drug.findall('db:targets/db:target', NS): poly = target.find('db:polypeptide', NS) if poly is None: continue gene = poly.find('db:gene-name', NS) if gene is not None and gene.text == target_gene: actions = [a.text for a in target.findall('db:actions/db:action', NS) if a.text] records.append({ 'drugbank_id': drug.find('db:drugbank-id[@primary="true"]', NS).text, 'name': drug.find('db:name', NS).text, 'groups': ', '.join(g.text for g in drug.findall('db:groups/db:group', NS)), 'actions': ', '.join(actions), }) df = pd.DataFrame(records) print(f"Drugs targeting {target_gene}: {len(df)}") print(df.to_string(index=False)) ``` ### Workflow 2: Polypharmacy Safety Screening **Goal**: Screen a medication list for all pairwise interactions with severity ranking. ```python import xml.etree.ElementTree as ET import pandas as pd NS = {'db': 'http://www.drugbank.ca'} tree = ET.parse('drugbank_all_full_database.xml') root = tree.getroot() # Build index and interaction maps idx = {} inter_map = {} # drugbank_id → {interacting_id: description} for drug in root.findall('db:drug', NS): name = drug.find('db:name', NS) db_id = drug.find('db:drugbank-id[@primary="true"]', NS) if name is None or db_id is None: continue idx[name.text.lower()] = db_id.text imap = {} for i in drug.findall('db:drug-interactions/db:drug-interaction', NS): imap[i.find('db:drugbank-id', NS).text] = i.find('db:description', NS).text inter_map[db_id.text] = imap medications = ['Warfarin', 'Aspirin', 'Omeprazole', 'Atorvastatin', 'Metformin'] report = [] med_ids = [(m, idx.get(m.lower())) for m in medications] for i, (n1, id1) in enumerate(med_ids): if not id1: continue for n2, id2 in med_ids[i+1:]: if not id2: continue desc = inter_map.get(id1, {}).get(id2) if desc: dl = desc.lower() sev = ('MAJOR' if any(w in dl for w in ['contraindicated','avoid','fatal']) else 'MODERATE' if any(w in dl for w in ['increase','decrease','enhance','reduce']) else 'MINOR') report.append({'Drug 1': n1, 'Drug 2': n2, 'Severity': sev, 'Description': desc[:120]}) df = pd.DataFrame(report) print(f"=== Polypharmacy Report: {len(medications)} medications, {len(df)} interactions ===") if not df.empty: print(df.sort_values('Severity').to_string(index=False)) ``` ## Key Parameters | Parameter | Function/Endpoint | Default | Description | |-----------|-------------------|---------|-------------| | `NS` (namespace dict) | All XPath queries | `{'db': 'http://www.drugbank.ca'}` | Required for all `find`/`findall` calls | | `@primary="true"` | `db:drugbank-id` | — | Selects the primary DrugBank ID (DB00XXX) vs secondary IDs | | `target_type` | `get_targets()` | `'targets'` | One of: `targets`, `enzymes`, `transporters`, `carriers` | | `radius` | Morgan fingerprint | `2` | Fingerprint radius; 2 = ECFP4, 3 = ECFP6 | | `nbits` | Morgan fingerprint | `2048` | Bit vector length; higher = fewer hash collisions | | `events` | `ET.iterparse()` | — | Parse events; use `('end',)` to fire on closing tags | ## Best Practices 1. **Build an in-memory index on startup**: Parse once (30-60s), build dict by ID + lowercase name. Never re-parse inside a loop 2. **Always pass the namespace dict**: Every `find()`/`findall()` needs `NS`. Omitting it is the #1 source of empty results 3. **Use `iterparse` for memory constraints**: With `elem.clear()`, avoids loading the full 1.5 GB tree 4. **Guard against None**: Not all drugs have all fields. Always check `el is not None` before `.text` 5. **Prefer calculated over experimental properties**: Calculated (SMILES, logP, MW) available for nearly all drugs 6. **Cache interaction maps for polypharmacy**: Pre-build `{drug_id: {interacting_id: desc}}` once ## Common Recipes ### Recipe: Export All Drug Properties to CSV ```python import pandas as pd records = [] for drug in root.findall('db:drug', NS): row = {'drugbank_id': drug.find('db:drugbank-id[@primary="true"]', NS).text, 'name': drug.find('db:name', NS).text, 'type': drug.get('type')} for prop in drug.findall('db:calculated-properties/db:property', NS): row[prop.find('db:kind', NS).text] = prop.find('db:value', NS).text records.append(row) pd.DataFrame(records).to_csv('drugbank_properties.csv', index=False) print(f"Exported {len(records)} drugs") ``` ### Recipe: Lipinski Rule-of-5 Filter ```python def check_lipinski(drug_element): props = get_all_properties(drug_element) try: mw = float(props.get('Molecular Weight', 9999)) logp = float(props.get('logP', 99)) hba = int(props.get('H Bond Acceptor Count', 99)) hbd = int(props.get('H Bond Donor Count', 99)) except (ValueError, TypeError): return None violations = sum([mw > 500, logp > 5, hba > 10, hbd > 5]) return {'MW': mw, 'logP': logp, 'HBA': hba, 'HBD': hbd, 'violations': violations, 'passes': violations <= 1} print(check_lipinski(find_drug('Imatinib'))) ``` ### Recipe: Find CYP450 Substrates ```python cyp = 'CYP3A4' substrates = [] for drug in root.findall('db:drug', NS): for enz in drug.findall('db:enzymes/db:enzyme', NS): n = enz.find('db:name', NS) if n is not None and n.text and cyp.lower() in n.text.lower(): actions = [a.text.lower() for a in enz.findall('db:actions/db:action', NS) if a.text] if 'substrate' in actions: substrates.append(drug.find('db:name', NS).text) print(f"{cyp} substrates: {len(substrates)}") ``` ## Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | `find()` returns `None` for known elements | Missing XML namespace | Always pass `NS = {'db': 'http://www.drugbank.ca'}` to `find()`/`findall()` | | `MemoryError` parsing full XML | ~2-3 GB in memory | Use `ET.iterparse()` with `elem.clear()` | | Slow startup (>60s) | Parsing 1.5 GB XML | Parse once, build index dict; avoid re-parsing | | Drug not found by name | Case sensitivity or alternate name | Normalize to lowercase; try CAS or DrugBank ID | | Empty `calculated-properties` | Biotech/protein drugs lack SMILES | Check `drug.get('type')` — biotech drugs have no small-molecule properties | | `AttributeError: 'NoneType'` | Optional XML element absent | Guard with `el is not None` before `.text` | | Asymmetric interaction counts | Interactions not symmetric in XML | Check both directions or build symmetric index | | `drugbank-downloader` auth failure | Invalid credentials | Verify account at https://go.drugbank.com/ | | REST API `429` | Exceeded rate limit | Switch to local XML for batch queries | ## Bundled Resources **`references/interactions_targets.md`** — Consolidates interactions (severity heuristics, batch screening, description parsing) and targets/pathways (polypeptide details, action catalogs, enzyme/transporter coverage, pathway enrichment). Relocated inline: basic extraction (Core API 3-4). Omitted: verbose per-field parsing duplicating Core API. **`references/chemical_analysis.md`** — Property extraction, descriptor computation, fingerprint similarity, drug-likeness filtering. Covers: full property catalog, similarity matrices, substructure search. Relocated inline: SMILES/InChI extraction + Tanimoto (Core API 5), Lipinski (Recipe). Omitted: 3D conformers (use rdkit-cheminformatics). **Original disposition** (2,717 lines: SKILL.md 190 + 5 refs 2,166 + script 351): - `SKILL.md` (190) — Stub rewritten with 6 Core API modules - `data-access.md` (243) → Core API 1 + Quick Start - `drug-queries.md` (387) → Core API 2 - `interactions.md` (426) → `references/interactions_targets.md` + Core API 3 - `targets-pathways.md` (519) → `references/interactions_targets.md` + Core API 4 - `chemical-analysis.md` (591) → `references/chemical_analysis.md` + Core API 5 - `drugbank_helper.py` (351) — Thin wrappers: `find_drug` → Quick Start; `get_drug_info`/`search_by_name` → Core API 2; `get_interactions`/`check_interaction`/`check_polypharmacy` → Core API 3; `get_targets` → Core API 4; `get_properties`/`get_smiles`/`get_inchi` → Core API 5 **Retention**: ~550 lines SKILL.md. With references (~600), aggregate ~1,150 / 2,717 = ~42%. Stub original; 5 refs → 2 via ceil(5/3)=2. ## Related Skills - **chembl-database-bioactivity** — Live bioactivity database (IC50, Ki, EC50); complements DrugBank's static drug catalog - **pubchem-compound-search** — Public compound property lookups without downloading a database - **rdkit-cheminformatics** — Full cheminformatics toolkit for 3D conformers, advanced fingerprints, descriptors beyond DrugBank properties ## References - DrugBank website: https://go.drugbank.com/ - DrugBank XML schema: https://docs.drugbank.com/xml/ - drugbank-downloader: https://pypi.org/project/drugbank-downloader/ - Wishart DS et al. (2018). DrugBank 5.0. *Nucleic Acids Res.* 46(D1):D1074-D1082. https://doi.org/10.1093/nar/gkx1037