--- name: bioservices description: "Primary Python tool for 40+ bioinformatics services. Preferred for multi-database workflows: UniProt, KEGG, ChEMBL, PubChem, Reactome, QuickGO. Unified API for queries, ID mapping, pathway analysis. For direct REST control, use individual database skills (uniprot-database, kegg-database)." --- # BioServices ## Overview BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently. ## When to Use This Skill This skill should be used when: - Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam - Analyzing metabolic pathways and gene functions via KEGG or Reactome - Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information - Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs) - Running sequence similarity searches (BLAST, MUSCLE alignment) - Querying gene ontology terms (QuickGO, GO annotations) - Accessing protein-protein interaction data (PSICQUIC, IntactComplex) - Mining genomic data (BioMart, ArrayExpress, ENA) - Integrating data from multiple bioinformatics resources in a single workflow ## Core Capabilities ### 1. Protein Analysis Retrieve protein information, sequences, and functional annotations: ```python from bioservices import UniProt u = UniProt(verbose=False) # Search for protein by name results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism") # Retrieve FASTA sequence sequence = u.retrieve("P43403", "fasta") # Map identifiers between databases kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403") ``` **Key methods:** - `search()`: Query UniProt with flexible search terms - `retrieve()`: Get protein entries in various formats (FASTA, XML, tab) - `mapping()`: Convert identifiers between databases Reference: `references/services_reference.md` for complete UniProt API details. ### 2. Pathway Discovery and Analysis Access KEGG pathway information for genes and organisms: ```python from bioservices import KEGG k = KEGG() k.organism = "hsa" # Set to human # Search for organisms k.lookfor_organism("droso") # Find Drosophila species # Find pathways by name k.lookfor_pathway("B cell") # Returns matching pathway IDs # Get pathways containing specific genes pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene # Retrieve and parse pathway data data = k.get("hsa04660") parsed = k.parse(data) # Extract pathway interactions interactions = k.parse_kgml_pathway("hsa04660") relations = interactions['relations'] # Protein-protein interactions # Convert to Simple Interaction Format sif_data = k.pathway2sif("hsa04660") ``` **Key methods:** - `lookfor_organism()`, `lookfor_pathway()`: Search by name - `get_pathway_by_gene()`: Find pathways containing genes - `parse_kgml_pathway()`: Extract structured pathway data - `pathway2sif()`: Get protein interaction networks Reference: `references/workflow_patterns.md` for complete pathway analysis workflows. ### 3. Compound Database Searches Search and cross-reference compounds across multiple databases: ```python from bioservices import KEGG, UniChem k = KEGG() # Search compounds by name results = k.find("compound", "Geldanamycin") # Returns cpd:C11222 # Get compound information with database links compound_info = k.get("cpd:C11222") # Includes ChEBI links # Cross-reference KEGG → ChEMBL using UniChem u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315 ``` **Common workflow:** 1. Search compound by name in KEGG 2. Extract KEGG compound ID 3. Use UniChem for KEGG → ChEMBL mapping 4. ChEBI IDs are often provided in KEGG entries Reference: `references/identifier_mapping.md` for complete cross-database mapping guide. ### 4. Sequence Analysis Run BLAST searches and sequence alignments: ```python from bioservices import NCBIblast s = NCBIblast(verbose=False) # Run BLASTP against UniProtKB jobid = s.run( program="blastp", sequence=protein_sequence, stype="protein", database="uniprotkb", email="your.email@example.com" # Required by NCBI ) # Check job status and retrieve results s.getStatus(jobid) results = s.getResult(jobid, "out") ``` **Note:** BLAST jobs are asynchronous. Check status before retrieving results. ### 5. Identifier Mapping Convert identifiers between different biological databases: ```python from bioservices import UniProt, KEGG # UniProt mapping (many database pairs supported) u = UniProt() results = u.mapping( fr="UniProtKB_AC-ID", # Source database to="KEGG", # Target database query="P43403" # Identifier(s) to convert ) # KEGG gene ID → UniProt kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535") # For compounds, use UniChem from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222") ``` **Supported mappings (UniProt):** - UniProtKB ↔ KEGG - UniProtKB ↔ Ensembl - UniProtKB ↔ PDB - UniProtKB ↔ RefSeq - And many more (see `references/identifier_mapping.md`) ### 6. Gene Ontology Queries Access GO terms and annotations: ```python from bioservices import QuickGO g = QuickGO(verbose=False) # Retrieve GO term information term_info = g.Term("GO:0003824", frmt="obo") # Search annotations annotations = g.Annotation(protein="P43403", format="tsv") ``` ### 7. Protein-Protein Interactions Query interaction databases via PSICQUIC: ```python from bioservices import PSICQUIC s = PSICQUIC(verbose=False) # Query specific database (e.g., MINT) interactions = s.query("mint", "ZAP70 AND species:9606") # List available interaction databases databases = s.activeDBs ``` **Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others. ## Multi-Service Integration Workflows BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns: ### Complete Protein Analysis Pipeline Execute a full protein characterization workflow: ```bash python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com ``` This script demonstrates: 1. UniProt search for protein entry 2. FASTA sequence retrieval 3. BLAST similarity search 4. KEGG pathway discovery 5. PSICQUIC interaction mapping ### Pathway Network Analysis Analyze all pathways for an organism: ```bash python scripts/pathway_analysis.py hsa output_directory/ ``` Extracts and analyzes: - All pathway IDs for organism - Protein-protein interactions per pathway - Interaction type distributions - Exports to CSV/SIF formats ### Cross-Database Compound Search Map compound identifiers across databases: ```bash python scripts/compound_cross_reference.py Geldanamycin ``` Retrieves: - KEGG compound ID - ChEBI identifier - ChEMBL identifier - Basic compound properties ### Batch Identifier Conversion Convert multiple identifiers at once: ```bash python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG ``` ## Best Practices ### Output Format Handling Different services return data in various formats: - **XML**: Parse using BeautifulSoup (most SOAP services) - **Tab-separated (TSV)**: Pandas DataFrames for tabular data - **Dictionary/JSON**: Direct Python manipulation - **FASTA**: BioPython integration for sequence analysis ### Rate Limiting and Verbosity Control API request behavior: ```python from bioservices import KEGG k = KEGG(verbose=False) # Suppress HTTP request details k.TIMEOUT = 30 # Adjust timeout for slow connections ``` ### Error Handling Wrap service calls in try-except blocks: ```python try: results = u.search("ambiguous_query") if results: # Process results pass except Exception as e: print(f"Search failed: {e}") ``` ### Organism Codes Use standard organism abbreviations: - `hsa`: Homo sapiens (human) - `mmu`: Mus musculus (mouse) - `dme`: Drosophila melanogaster - `sce`: Saccharomyces cerevisiae (yeast) List all organisms: `k.list("organism")` or `k.organismIds` ### Integration with Other Tools BioServices works well with: - **BioPython**: Sequence analysis on retrieved FASTA data - **Pandas**: Tabular data manipulation - **PyMOL**: 3D structure visualization (retrieve PDB IDs) - **NetworkX**: Network analysis of pathway interactions - **Galaxy**: Custom tool wrappers for workflow platforms ## Resources ### scripts/ Executable Python scripts demonstrating complete workflows: - `protein_analysis_workflow.py`: End-to-end protein characterization - `pathway_analysis.py`: KEGG pathway discovery and network extraction - `compound_cross_reference.py`: Multi-database compound searching - `batch_id_converter.py`: Bulk identifier mapping utility Scripts can be executed directly or adapted for specific use cases. ### references/ Detailed documentation loaded as needed: - `services_reference.md`: Comprehensive list of all 40+ services with methods - `workflow_patterns.md`: Detailed multi-step analysis workflows - `identifier_mapping.md`: Complete guide to cross-database ID conversion Load references when working with specific services or complex integration tasks. ## Installation ```bash pip install bioservices ``` Dependencies are automatically managed. Package is tested on Python 3.9-3.12. ## Additional Information For detailed API documentation and advanced features, refer to: - Official documentation: https://bioservices.readthedocs.io/ - Source code: https://github.com/cokelaer/bioservices - Service-specific references in `references/services_reference.md`