--- name: setup description: Set up the ENCODE Toolkit server connection. Use when the user needs help installing, configuring, or troubleshooting the ENCODE connector. disable-model-invocation: true --- # ENCODE Toolkit Setup ## When to Use - User needs help installing or configuring the ENCODE Toolkit MCP server - User is getting connection errors or server startup failures - User asks "how do I set up ENCODE?" or "install ENCODE toolkit" - User needs to configure ENCODE credentials for restricted data access - User wants to verify their ENCODE server connection is working - User is setting up a new environment and needs the ENCODE plugin Help the user set up the ENCODE Toolkit server. The server connects Claude to the ENCODE Project genomics database — the largest public catalog of functional genomic elements with 8,000+ experiments across 50+ assay types. ## Installation The ENCODE Toolkit server is installed via `uvx` (recommended) or `pip`: ### For Claude Code (CLI) ```bash claude mcp add encode -- uvx encode-toolkit ``` ### For Claude Desktop Add to `claude_desktop_config.json`: - **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` - **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` ```json { "mcpServers": { "encode": { "command": "uvx", "args": ["encode-toolkit"] } } } ``` Then restart Claude Desktop. ### For VS Code (Claude Extension) Add to your VS Code `settings.json` (Ctrl/Cmd + Shift + P → "Preferences: Open Settings (JSON)"): ```json { "claude.mcpServers": { "encode": { "command": "uvx", "args": ["encode-toolkit"] } } } ``` ### For Cursor Add to `.cursor/mcp.json` in your project root: ```json { "mcpServers": { "encode": { "command": "uvx", "args": ["encode-toolkit"] } } } ``` ### For Windsurf Add to `~/.codeium/windsurf/mcp_config.json`: ```json { "mcpServers": { "encode": { "command": "uvx", "args": ["encode-toolkit"] } } } ``` ### Alternative: pip install ```bash pip install encode-toolkit encode-toolkit # Run the server ``` --- ## Verify Installation After setup, test the connection with these verification queries (run them in order): ### Step 1: Check metadata access Ask: "List available ENCODE assay types" - This calls `encode_get_metadata(metadata_type="assays")` - Expected: Returns 50+ assay types including ChIP-seq, ATAC-seq, RNA-seq, WGBS, Hi-C ### Step 2: Test search Ask: "Search for ATAC-seq experiments on human brain" - This calls `encode_search_experiments(assay_title="ATAC-seq", organ="brain", organism="Homo sapiens")` - Expected: Returns experiment accessions (ENCSR...) with assay, biosample, and status info ### Step 3: Test facets Ask: "What organs have the most ENCODE data?" - This calls `encode_get_facets(facet_field="organ")` - Expected: Returns organ counts showing brain, liver, heart, etc. ranked by experiment count If all three work, your setup is complete. --- ## Authentication Most ENCODE data is public and needs no authentication. For restricted/unreleased data: 1. Get API credentials from https://www.encodeproject.org/profile/ (requires ENCODE account) 2. Store them: ``` Ask: "Store my ENCODE credentials" → Calls encode_manage_credentials(action="store", access_key="...", secret_key="...") ``` 3. Credentials are encrypted via the OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service) 4. To verify: `encode_manage_credentials(action="status")` 5. To remove: `encode_manage_credentials(action="remove")` --- ## 20 Available Tools After setup, these tools are available: | Category | Tools | Purpose | |----------|-------|---------| | **Search** | `encode_search_experiments`, `encode_get_facets`, `encode_get_metadata` | Find experiments, explore data landscape, get valid filter values | | **Experiment Details** | `encode_get_experiment`, `encode_compare_experiments` | Get full experiment metadata, compare two experiments | | **Files** | `encode_search_files`, `encode_list_files`, `encode_get_file_info` | Find files, list files for an experiment, get file details | | **Download** | `encode_download_files`, `encode_batch_download` | Download individual or batch files with MD5 verification | | **Tracking** | `encode_track_experiment`, `encode_list_tracked`, `encode_get_tracking_summary` | Local experiment tracking with SQLite | | **Provenance** | `encode_log_derived_file`, `encode_get_provenance` | Log analysis outputs with full lineage | | **Citations** | `encode_get_citations`, `encode_link_reference` | Publication data, cross-reference to PubMed/GEO | | **Credentials** | `encode_manage_credentials` | Store/remove API credentials | | **Collection** | `encode_summarize_collection` | Summarize tracked experiment portfolio | --- ## First-Run Walkthrough: Pancreatic Islet Epigenomics This walkthrough demonstrates a complete workflow from installation to data exploration. ### 1. Explore what's available ``` "What ENCODE assay types are available for human pancreas?" → encode_get_facets(facet_field="assay_title", organ="pancreas", organism="Homo sapiens") ``` ### 2. Find specific experiments ``` "Find all histone ChIP-seq experiments on human pancreas" → encode_search_experiments(assay_title="Histone ChIP-seq", organ="pancreas", organism="Homo sapiens") ``` ### 3. Examine an experiment ``` "Get details for ENCSR123ABC" → encode_get_experiment(accession="ENCSR123ABC") ``` ### 4. Find the right files ``` "List the preferred BED files for ENCSR123ABC" → encode_list_files(accession="ENCSR123ABC", file_format="bed", assembly="GRCh38") ``` ### 5. Download data ``` "Download the IDR-thresholded peaks for ENCSR123ABC" → encode_download_files(accession="ENCSR123ABC", file_format="bed", output_type="IDR thresholded peaks") ``` ### 6. Track your experiment ``` "Track ENCSR123ABC in my local database with note 'H3K27ac pancreatic islets'" → encode_track_experiment(accession="ENCSR123ABC", notes="H3K27ac pancreatic islets") ``` --- ## Cross-Database Integration The ENCODE Toolkit works alongside other MCP servers and REST APIs: | Database | Access Method | What It Adds | |----------|--------------|--------------| | **PubMed** | MCP server (`search_articles`) | Literature citations for ENCODE experiments | | **bioRxiv** | MCP server (`search_preprints`) | Preprint discovery for latest research | | **ClinicalTrials.gov** | MCP server (`search_trials`) | Clinical trial cross-reference | | **Open Targets** | MCP server (`query_open_targets_graphql`) | Drug target identification | | **GTEx** | REST API via skill | Tissue-specific expression context | | **ClinVar** | REST API via skill | Clinical variant annotation | | **GWAS Catalog** | REST API via skill | Trait-associated variant lookups | | **gnomAD** | GraphQL via skill | Population allele frequencies | | **Ensembl** | REST API via skill | VEP annotation, Regulatory Build | | **UCSC** | REST API via skill | Genome browser tracks, cCRE data | | **GEO** | E-utilities via skill | Complementary expression datasets | | **JASPAR** | REST API via skill | TF binding motif databases | | **CellxGene** | REST API via skill | Single-cell expression atlases | --- ## 47 Expert Skills Beyond the 20 tools, the ENCODE Toolkit includes 47 skills providing domain expertise: - **Core (5)**: setup, search-encode, download-encode, track-experiments, cross-reference - **Analysis (9)**: quality-assessment, integrative-analysis, regulatory-elements, epigenome-profiling, compare-biosamples, visualization-workflow, motif-analysis, peak-annotation, batch-analysis - **Pipelines (7)**: pipeline-chipseq, pipeline-atacseq, pipeline-rnaseq, pipeline-wgbs, pipeline-hic, pipeline-dnaseseq, pipeline-cutandrun - **External DBs (9)**: gtex-expression, clinvar-annotation, cellxgene-context, gwas-catalog, jaspar-motifs, ensembl-annotation, geo-connector, gnomad-variants, ucsc-browser - **Workflows (10)**: data-provenance, cite-encode, variant-annotation, pipeline-guide, single-cell-encode, disease-research, publication-trust, bioinformatics-installer, scientific-writing, liftover-coordinates - **Data Aggregation (4)**: histone-aggregation, accessibility-aggregation, hic-aggregation, methylation-aggregation - **Meta-Analysis (2)**: scrna-meta-analysis, multi-omics-integration - **Functional Genomics (1)**: functional-screen-analysis --- ## Pitfalls & Troubleshooting | Problem | Cause | Fix | |---------|-------|-----| | "Server not found" | Claude not restarted after config change | Restart Claude Desktop / reload Claude Code | | "uvx not found" | uv not installed | `curl -LsSf https://astral.sh/uv/install.sh \| sh` | | Timeout errors | Slow connection or ENCODE API load | Retry; rate limit (10 req/sec) is handled automatically | | 403 on downloads | File requires authentication | `encode_manage_credentials(action="store", ...)` | | No results returned | Filters too narrow | Broaden filters; use `encode_get_facets` to see available data | | "Invalid accession" | Wrong format | Must be ENCSR/ENCFF/ENCBS format (e.g., ENCSR000AAA) | | Empty facets | API connectivity issue | Check internet; try `encode_get_metadata(metadata_type="assays")` | | Stale results | Cached data | Cache TTL is 1 hour; restart server to clear | --- ## Code Examples ### 1. Verify server connection with metadata query ``` encode_get_metadata(metadata_type="assays") ``` Expected output: ```json { "assays": ["ATAC-seq", "ChIP-seq", "CUT&RUN", "CUT&Tag", "DNase-seq", "Hi-C", "MPRA", "RNA-seq", "STARR-seq", "WGBS", "eCLIP", "scATAC-seq", "scRNA-seq"] } ``` ### 2. Test search functionality ``` encode_search_experiments(assay_title="ATAC-seq", organ="brain", organism="Homo sapiens", limit=3) ``` Expected output: ```json { "total": 32, "results": [ {"accession": "ENCSR000AAA", "assay_title": "ATAC-seq", "biosample_summary": "brain", "status": "released"} ] } ``` ### 3. Test facet exploration ``` encode_get_facets(facet_field="organ", organism="Homo sapiens") ``` Expected output: ```json { "facets": { "organ": {"brain": 450, "blood": 380, "liver": 220, "heart": 180, "lung": 150} } } ``` ## Related Skills | Skill | When to Use | |-------|------------| | `search-encode` | First skill to use after setup — find experiments by assay, tissue, target | | `download-encode` | Download ENCODE files (BED, bigWig, FASTQ, BAM) after finding experiments | | `pipeline-guide` | Set up Nextflow pipelines for processing raw ENCODE data | | `bioinformatics-installer` | Install all bioinformatics tools needed for ENCODE analysis | | `cross-reference` | Link ENCODE experiments to PubMed, GEO, ClinicalTrials.gov | | `quality-assessment` | Evaluate data quality before analysis | | `publication-trust` | Verify literature claims backing analytical decisions | --- ## Presenting Results When reporting setup results: - **Connection status**: Confirm the ENCODE Toolkit server is connected and responding. Report the server version if available - **Available tools**: List the 20 available ENCODE tools grouped by function (search, download, track, cross-reference, credentials) - **Test query result**: Run a simple validation query (e.g., `encode_get_metadata(metadata_type="assays")`) and confirm it returns results successfully - **Authentication status**: Note whether credentials are configured (for restricted data) or that public data access requires no authentication - **Troubleshooting**: If any issues were encountered during setup, summarize the problem and resolution - **Next steps**: Suggest `search-encode` to find experiments, or `encode_get_facets` to explore what ENCODE data is available for their research area ## For the request: "$ARGUMENTS"