{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Gene data using NCBI Datasets command line tools\n", "The objective of this Notebook is to demonstrate how to use NCBI Datasets command line tools to explore and download sequence and metadata for RefSeq annotated genes.\n", "\n", "The datasets command-line tool currently returns two types of data:\n", " - Gene summaries are gene metadata returned in [JSON](https://www.json.org/json-en.html) format\n", " - Gene data packages are downloadable zip files including gene, transcript and protein sequence, a data table and a data report in [JSON Lines](https://jsonlines.org/) format. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started \n", "To get started, we'll first download and grant execute permissions for the datasets command line tools. \n", "Datasets has two command line tools \n", "- The **datasets** tool is used to query and download sequence, annotation and metadata for all domains of life.\n", "- The **dataformat** tool is used to convert metadata downloaded from NCBI Datasets from JSON lines format to other formats." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloading CLI tools...\n[size: 11M] datasets v11.7.0\n[size: 13M] dataformat v11.7.0\n" ] } ], "source": [ "%%bash\n", "printf \"Downloading CLI tools...\\n\"\n", "for app in datasets dataformat\n", "do\n", " curl --silent --remote-name \"https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/${app}\"\n", " chmod +x ${app}\n", " printf \"[size: %s] %s v%s\\n\" $(du --human-readable ${app}) $(./${app} version)\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll also download the command line tool [jq](https://stedolan.github.io/jq/) to parse the datasets JSON Lines data reports into a readable format." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded jq-1.6" ] } ], "source": [ "%%bash\n", "curl --silent --location --output jq 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64'\n", "chmod +x jq\n", "printf \"Downloaded %s\" $(./jq --version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting help\n", "To get help in using the datasets tools or any commands or sub-commands specify --help after the command" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "datasets is a command-line tool that is used to query and download biological sequence data\nacross all domains of life from NCBI databases.\n\nRefer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.\n\nUsage\n datasets [command]\n\nData Retrieval Commands\n summary print a summary of a gene or genome dataset\n download download a gene, genome or coronavirus dataset as a zip file\n rehydrate rehydrate a downloaded, dehydrated dataset\n\nMiscellaneous Commands\n completion generate autocompletion scripts\n version print the version of this client and exit\n help Help about any command\n\nFlags\n -h, --help help for datasets\n\nUse datasets help for detailed help about a command.\n" ] } ], "source": [ "!./datasets --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting gene metadata for a list of Crassostrea virginia genes \n", "In this step, we'll use the Datasets summary gene command to get gene metadata for a list Crassostrea gigas genes. Datasets gene summaries can be queried using NCBI Gene ID, gene symbol or RefSeq transcript or protein accession combined with a taxon name. In this example, we'll query for 3 Crassostrea virginica genes, LOC111112135, LOC111112138, LOC111110223, by specifying gene symbol and taxon name. To make the JSON output easy to read we'll use the command line parser jq." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"genes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"gene\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"annotations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"assemblies_in_scope\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_002022765.2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"C_virginica-3.0\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2017-09-11\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Crassostrea virginica Annotation Release 100\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"8\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"common_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"eastern oyster\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"description\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 6\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"gene_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"111110223\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_ranges\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"nomenclature_authority\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"symbol\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"LOC111110223\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6565\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"taxname\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Crassostrea virginica\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"transcripts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022446624.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022446624.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"304\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"1365\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695129\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691394\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691585\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70690196\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695129\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691394\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691585\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70690196\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 8 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3017\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022302332.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"isoform_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"isoform X2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m353\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 6\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022446623.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022446623.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"304\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2562\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695129\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691585\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695129\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70691585\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 8 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035787.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70687673\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"70695429\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"minus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m4214\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022302331.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"isoform_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"isoform X1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m752\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 13\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"query\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"LOC111110223\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"gene\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"annotations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"assemblies_in_scope\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_002022765.2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"C_virginica-3.0\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2017-09-11\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Crassostrea virginica Annotation Release 100\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"9\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"common_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"eastern oyster\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"description\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 13\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"gene_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"111112135\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_ranges\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401835\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"nomenclature_authority\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"symbol\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"LOC111112135\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6565\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"taxname\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Crassostrea virginica\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"transcripts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449492.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449492.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"243\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2570\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401835\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401901\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101402029\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101402206\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101403913\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401835\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401901\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101402029\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101402206\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101403913\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401835\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 9 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401835\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2654\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022305200.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"isoform_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"isoform X2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m775\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 13\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449491.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449491.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"49\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2379\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401848\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401901\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101403913\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401848\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401901\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101403913\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401848\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 9 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101401848\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101406321\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2463\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022305199.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"isoform_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"isoform X1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m776\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 13\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"query\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"LOC111112135\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"gene\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"annotations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"assemblies_in_scope\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_002022765.2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"C_virginica-3.0\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2017-09-11\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"release_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Crassostrea virginica Annotation Release 100\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"9\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"common_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"eastern oyster\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"description\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 4\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"gene_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"111112138\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_ranges\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349832\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"nomenclature_authority\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"symbol\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"LOC111112138\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6565\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"taxname\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Crassostrea virginica\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"transcripts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449500.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449500.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"106\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2412\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349832\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349916\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101354401\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349832\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349916\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101354401\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349832\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 9 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101349832\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2632\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022305208.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m768\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 4\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449501.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"cds\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XM_022449501.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"143\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2449\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352482\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352603\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101354401\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_locations\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"exons\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352482\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352603\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101354401\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"order\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352482\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"sequence_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome 9 Reference C_virginica-3.0 Primary Assembly\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"genomic_range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NC_035788.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"range\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[1;39m{\n \u001b[0m\u001b[34;1m\"begin\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101352482\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"end\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"101356947\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"orientation\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"plus\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2669\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"transcript variant X2\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"protein\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n \u001b[0m\u001b[34;1m\"accession_version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"XP_022305209.1\"\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m768\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"toll-like receptor 4\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING_MODEL\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROTEIN_CODING\"\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m,\n \u001b[0m\u001b[34;1m\"query\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n \u001b[0;32m\"LOC111112138\"\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n \u001b[1;39m}\u001b[0m\u001b[1;39m\n \u001b[1;39m]\u001b[0m\u001b[1;39m\n\u001b[1;39m}\u001b[0m\n" ] } ], "source": [ "!./datasets summary gene symbol LOC111112135 LOC111112138 LOC111110223 --taxon \"crassostrea virginica\" | ./jq ." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Downloading gene sequence, annotation and metadata\n", "Next, we'll use the Datasets command line tool to download a **gene data package** containing gene, transcript and protein sequence, a data report and a data table. The gene data reports contain detailed gene metadata in a hierarchical JSON Lines format. The gene table contains a subset of gene metadata is tsv format. Gene data packages can be queried using NCBI Gene ID, gene symbol or RefSeq transcript or protein accession combined with a taxon name. Datasets data reports are in \n", "\n", "The default gene dataset includes the following files:\n", "- gene.fna (gene sequences)\n", "- rna.fna (transcript sequences)\n", "- protein.faa (protein sequences)\n", "- data_report.jsonl (data report with gene metadata)\n", "- data_table.tsv (data table with gene metadata, one transcript per row)\n", "- dataset_catalog.json (a list of files and file types included in the dataset)\n", " \n", "In this example, we'll query using the same three NCBI Gene symbols and taxon name. We'll also use the --filename flag to provide a custom name for the download package. For the purposes of this demonstration, we will redirect all messages from the datasets command to datasets.log." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded:\n20K\t3_eastern_oyster_genes.zip" ] } ], "source": [ "!./datasets download gene symbol LOC111112135 LOC111112138 LOC111110223 --taxon \"crassostrea virginica\" --filename 3_eastern_oyster_genes.zip >datasets.log 2>&1\n", "!printf \"Downloaded:\\n%s\" \"$(du --human-readable 3_eastern_oyster_genes.zip)\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use unzip command to view the contents of the gene data package " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Archive: 3_eastern_oyster_genes.zip\n Length Date Time Name\n--------- ---------- ----- ----\n 661 2021-03-21 16:47 README.md\n 19976 2021-03-21 16:47 ncbi_dataset/data/gene.fna\n 18487 2021-03-21 16:47 ncbi_dataset/data/rna.fna\n 4793 2021-03-21 16:47 ncbi_dataset/data/protein.faa\n 7195 2021-03-21 16:47 ncbi_dataset/data/data_report.jsonl\n 1783 2021-03-21 16:47 ncbi_dataset/data/data_table.tsv\n 454 2021-03-21 16:47 ncbi_dataset/data/dataset_catalog.json\n--------- -------\n 53349 7 files\n" ] } ], "source": [ "!unzip -l 3_eastern_oyster_genes.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll extract the data files. **Note** that all NCBI Datasets packages use similar file structure. The -o argument will override existing files" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Archive: 3_eastern_oyster_genes.zip\n inflating: README.md \n inflating: ncbi_dataset/data/gene.fna \n inflating: ncbi_dataset/data/rna.fna \n inflating: ncbi_dataset/data/protein.faa \n inflating: ncbi_dataset/data/data_report.jsonl \n inflating: ncbi_dataset/data/data_table.tsv \n inflating: ncbi_dataset/data/dataset_catalog.json \n" ] } ], "source": [ "!unzip -o 3_eastern_oyster_genes.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting metadata from the Datasets gene table\n", "The Datasets gene data package contains two types of metadata files, the gene data report and the gene table. The gene data report contains detailed gene information in a hierarchical JSON lines format. By contrast, the gene table contains a reduced, flattened representation of the hierarchial gene data report. In this step, we demonstrate how you can use common unix commands to view metadata in the gene table." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "gene_id\tdescription\tscientific_name\tcommon_name\ttax_id\tgenomic_range\n111110223\ttoll-like receptor 6\tCrassostrea virginica\teastern oyster\t6565\tNC_035787.1:70687673-70695429\n111110223\ttoll-like receptor 6\tCrassostrea virginica\teastern oyster\t6565\tNC_035787.1:70687673-70695429\n111112135\ttoll-like receptor 13\tCrassostrea virginica\teastern oyster\t6565\tNC_035788.1:101401835-101406321\n111112135\ttoll-like receptor 13\tCrassostrea virginica\teastern oyster\t6565\tNC_035788.1:101401835-101406321\n111112138\ttoll-like receptor 4\tCrassostrea virginica\teastern oyster\t6565\tNC_035788.1:101349832-101356947\n111112138\ttoll-like receptor 4\tCrassostrea virginica\teastern oyster\t6565\tNC_035788.1:101349832-101356947\n" ] } ], "source": [ "!head ncbi_dataset/data/data_table.tsv | cut -f1,3-7" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ ">NC_035787.1:c70695429-70687673 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [chromosome=8]\nGTCGCGTGTACTCGATCTGCTGAACGCAGTATCGGTGTATAAATCATTTTGTTCTTCTCGATGAAAAAAA\nTTAGGCAAATTTGCCATCAAGTTTAAAAGCTATTCTCACTGTTTCACGCATCGGGACATTTTAAATGGAT\nTTTCCAATGCACTAGTTTCATATAAGTCTGCATACTTCCTGGTCTGTGAATAAATCAAACTTAATTATGA\nTTTCATGAAGAAATGTAATGCAATGACGAGTTGCATTTTGGAGGAATTTTGAACAGATTTTTCTGAATAA\nGCTAGAAACAATTTGTCGAAGGTATGTTTAGAATTTTTCCCGAATATTTAGAAGCTTTGCCTTTAAAATC\nATTGATTATGCAGGCCTTAATTACTCCTTCCAGTTAATGTGCATCCTTGATTGATTGGTTATATTGGCAG\nCAGTTAAACTATTCAATGACATCATAATAAGGGGATTCATGGTCAGATTTGGTGTCAATGTTCAGAAAAC\nTGTATCTACTTTCTATCTATCTGTATCTAGTTACTAAGCAAATATAATCTTCACCATCAAGTACTTATTA\nTAAGACTTACTTTAAACCTGTACATGGAATATTATACATGAAAGACATGGGACTCTACCGGTAAACAAAA\n" ] } ], "source": [ "!head --lines 10 ncbi_dataset/data/gene.fna" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ ">NC_035787.1:c70695429-70687673 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [chromosome=8]\n>NC_035788.1:101401835-101406321 LOC111112135 [organism=Crassostrea virginica] [GeneID=111112135] [chromosome=9]\n>NC_035788.1:101349832-101356947 LOC111112138 [organism=Crassostrea virginica] [GeneID=111112138] [chromosome=9]\n" ] } ], "source": [ "!grep '^>' ncbi_dataset/data/gene.fna" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ ">XM_022446624.1 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [transcript=X2]\n>XM_022446623.1 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [transcript=X1]\n>XM_022449492.1 LOC111112135 [organism=Crassostrea virginica] [GeneID=111112135] [transcript=X2]\n>XM_022449491.1 LOC111112135 [organism=Crassostrea virginica] [GeneID=111112135] [transcript=X1]\n>XM_022449500.1 LOC111112138 [organism=Crassostrea virginica] [GeneID=111112138] [transcript=X1]\n>XM_022449501.1 LOC111112138 [organism=Crassostrea virginica] [GeneID=111112138] [transcript=X2]\n" ] } ], "source": [ "!grep '^>' ncbi_dataset/data/rna.fna" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ ">XP_022302331.1 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [isoform=X1]\n>XP_022302332.1 LOC111110223 [organism=Crassostrea virginica] [GeneID=111110223] [isoform=X2]\n>XP_022305199.1 LOC111112135 [organism=Crassostrea virginica] [GeneID=111112135] [isoform=X1]\n>XP_022305200.1 LOC111112135 [organism=Crassostrea virginica] [GeneID=111112135] [isoform=X2]\n>XP_022305208.1 LOC111112138 [organism=Crassostrea virginica] [GeneID=111112138]\n>XP_022305209.1 LOC111112138 [organism=Crassostrea virginica] [GeneID=111112138]\n" ] } ], "source": [ "!grep '^>' ncbi_dataset/data/protein.faa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converting the JSON Lines gene data report to tabular format\n", "Next, we'll show how to use the dataformat command line tool to convert the hierarchical JSON Lines gene data report into a tabular formats including Excel and tsv. First we'll use the help command to view the fields available for conversion in tabular format. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\nConvert Gene Report into TSV format.\n\nRefer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.\n\nUsage\n dataformat tsv gene [flags]\n\nExamples\n dataformat tsv gene --inputfile gene_package/ncbi_dataset/data/data_report.jsonl\n dataformat tsv gene --package genes.zip\n\nFlags\n --fields strings comma-separated list of fields\n - annotation-assemblies-in-scope-accession\n - annotation-assemblies-in-scope-name\n - annotation-release-date\n - annotation-release-name\n - chromosomes\n - common-name\n - description\n - ensembl-geneids\n - gene-id\n - gene-type\n - genomic-range-accession\n - genomic-range-range-orientation\n - genomic-range-range-start\n - genomic-range-range-stop\n - name-authority\n - name-id\n - omim-ids\n - orientation\n - ref-standard-genomic-region-type\n - replaced-gene-id\n - rna-type\n - swissprot-accessions\n - symbol\n - synonyms\n - tax-id\n - tax-name\n - transcript-accession\n - transcript-ensembl-transcript\n - transcript-genomic-location-accession\n - transcript-genomic-location-seq-name\n - transcript-length\n - transcript-name\n - transcript-protein-accession\n - transcript-protein-ensembl-protein\n - transcript-protein-isoform\n - transcript-protein-length\n - transcript-protein-mat-peptide-accession\n - transcript-protein-mat-peptide-length\n - transcript-protein-mat-peptide-name\n - transcript-protein-name\n - transcript-transcript-type\n -h, --help help for gene\n --inputfile string input file\n --package string datasets package (zip archive), inputfile parameter is relative to the root path inside the archive\n\n\n\nGlobal Flags\n --elide-header Do not output header\n\n" ] } ], "source": [ "!./dataformat tsv gene --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll use the dataformat tool to convert a default set of data fields from the gene data report to tsv format. We'll also use the --package flag to identify the gene data report file to convert." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "NCBI GeneID\tSymbol\tTaxonomic ID\tTaxonomic Name\n111110223\tLOC111110223\t6565\tCrassostrea virginica\n111112135\tLOC111112135\t6565\tCrassostrea virginica\n111112138\tLOC111112138\t6565\tCrassostrea virginica\n" ] } ], "source": [ "!./dataformat tsv gene --package 3_eastern_oyster_genes.zip --fields gene-id,symbol,tax-id,tax-name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Limiting the fasta download to a subset of transcript and protein sequences\n", "Now we'll show you how to limit the transcript and protein fasta file to a subset of transcripts and proteins. In this example we'll use the --fasta-filter flag to extract sequence for the transcripts encoding the longest protein." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded:\n8.0K\t3_eastern_oyster_transcripts.zip" ] } ], "source": [ "!./datasets download gene symbol LOC111112135 LOC111112138 LOC111110223 --taxon \"crassostrea virginica\" --filename 3_eastern_oyster_transcripts.zip --fasta-filter XM_022446623.1 XM_022449491.1 XM_022449500.1 >datasets.log 2>&1\n", "!printf \"Downloaded:\\n%s\" \"$(du --human-readable 3_eastern_oyster_transcripts.zip)\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading sequence and metadata for all RefSeq genes for a given organism\n", "Finally, we'll show how to download a gene data package containing sequence and metadata for all genes for a given organism. In this example, we'll download all genes for Crassostrea virginica. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded:\n226M\teastern_oyster_genes.zip" ] } ], "source": [ "!./datasets download gene taxon \"crassostrea virginica\" --filename eastern_oyster_genes.zip >datasets.log 2>&1\n", "!printf \"Downloaded:\\n%s\" \"$(du --human-readable eastern_oyster_genes.zip)\"" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Archive: eastern_oyster_genes.zip\n Length Date Time Name\n--------- ---------- ----- ----\n 661 2021-03-21 16:47 README.md\n438079325 2021-03-21 16:47 ncbi_dataset/data/gene.fna\n187895654 2021-03-21 16:48 ncbi_dataset/data/rna.fna\n 45124543 2021-03-21 16:52 ncbi_dataset/data/protein.faa\n135988248 2021-03-21 16:56 ncbi_dataset/data/data_report.jsonl\n 17830664 2021-03-21 16:59 ncbi_dataset/data/data_table.tsv\n 454 2021-03-21 17:00 ncbi_dataset/data/dataset_catalog.json\n--------- -------\n824919549 7 files\n" ] } ], "source": [ "!unzip -l eastern_oyster_genes.zip" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1-final" } }, "nbformat": 4, "nbformat_minor": 4 }