{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting genome assembly data using NCBI Datasets command line tools\n", "\n", "The objective of this Notebook is to demonstrate how to use NCBI Datasets command line tools to explore and download genome assembly sequence and metadata. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting started \n", "First, we'll download and grant execute permissions for the datasets command line tools. \n", "Datasets has two command line tools \n", "- The **datasets** tool is used to query and download sequence, annotation and metadata for all domains of life.\n", "- The **dataformat** tool is used to convert metadata downloaded from NCBI Datasets from JSON lines format to other formats." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloading CLI tools...\n[size: 11M] datasets v11.7.0\n[size: 13M] dataformat v11.7.0\n" ] } ], "source": [ "%%bash\n", "printf \"Downloading CLI tools...\\n\"\n", "for app in datasets dataformat\n", "do\n", " curl --silent --remote-name \"https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/${app}\"\n", " chmod +x ${app}\n", " printf \"[size: %s] %s v%s\\n\" $(du --human-readable ${app}) $(./${app} version)\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll also download the command line tool [jq](https://stedolan.github.io/jq/) to parse the datasets JSON Lines data reports into a readable format." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded jq-1.6" ] } ], "source": [ "%%bash\n", "curl --silent --location --output jq 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64'\n", "chmod +x jq\n", "printf \"Downloaded %s\" $(./jq --version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting help\n", "To get help in using the tools or any sub-commands specify --help after the command:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "datasets is a command-line tool that is used to query and download biological sequence data\nacross all domains of life from NCBI databases.\n\nRefer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.\n\nUsage\n datasets [command]\n\nData Retrieval Commands\n summary print a summary of a gene or genome dataset\n download download a gene, genome or coronavirus dataset as a zip file\n rehydrate rehydrate a downloaded, dehydrated dataset\n\nMiscellaneous Commands\n completion generate autocompletion scripts\n version print the version of this client and exit\n help Help about any command\n\nFlags\n -h, --help help for datasets\n\nUse datasets help for detailed help about a command.\n" ] } ], "source": [ "!./datasets --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting genome metadata\n", "\n", "To begin, we'll use the Datasets summary genome command to explore all the available RefSeq genomes for a group of organisms.\n", "\n", "Genome summaries can be accessed in four ways:\n", "\n", "- accession: an NCBI Assembly accession\n", "- organism: an organism or a taxonomical group name\n", "- taxid: using an NCBI Taxonomy identifier, at any level.\n", "- BioProject: using an NCBI BioProject accession\n", "\n", "In this example, we'll view metadata for all Crustacea genome assemblies using taxon name. Additionally, we'll limit our search to genome annotated by NCBI's RefSeq group using the --refseq flag. To make the JSON output easy to read we'll use the command line parser jq. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assemblies\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"8160265\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"60912986\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"15723551\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"5579866\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6762708\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Mar 16, 2020\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Daphnia_magna/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_003990815.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA490418\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Daphnia magna strain:SK Genome sequencing and assembly\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"LG1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG2\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG3\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG4\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG5\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG6\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG7\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG8\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG9\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"LG10\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"MT\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m14466\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"ASM399081v1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"131337948\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"35525\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6668\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Daphnia magna\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sex\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"pooled male and female\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"strain\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SK\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"35525\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Daphnia magna\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"122937721\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2019-01-07\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"10610954\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"175309680\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"15496064\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6791917\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"9576030\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Dec 21, 2017\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Eurytemora_affinis/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_000591075.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Scaffold\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA203087\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA163973\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Eurytemora affinis strain:Atlantic clade Genome sequencing and assembly\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA163973\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA163993\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"i5k Arthropod Genome Pilot Project\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA163993\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"i5k initiative\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m67724\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Eaff_2.0\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"324965786\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"88015\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"88014\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Eurytemora affinis\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"strain\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Atlantic clade\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"88015\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Eurytemora affinis\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"389032277\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2017-12-12\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"8126944\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"330164727\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"13113008\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"5472117\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"7246018\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Nov 04, 2020\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Pollicipes_pollicipes/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_011947565.2\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Scaffold\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA614970\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA649812\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Pollicipes pollicipes isolate:AB1234 Genome sequencing and assembly\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA649812\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA533106\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"The Global Invertebrate Genomics Alliance (GIGA) genomes and transcriptomes\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA533106\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Earth BioGenome Project (EBP)\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m109725\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Ppol_2\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"597159620\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"isolate\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"AB1234\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"41117\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"merged_tax_ids\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"223993\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"36136\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Pollicipes pollicipes\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"41117\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Pollicipes pollicipes\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"770089732\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2020-10-27\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"11302958\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"828679130\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"18240609\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"7235079\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"8792038\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Nov 19, 2020\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Penaeus_monodon/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_015228065.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Chromosome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA611030\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Genomic sequences of Penaeus monodon\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"2\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"3\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"4\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"5\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"6\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"7\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"8\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"9\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"10\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"11\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"12\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"13\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"14\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"15\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"16\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"17\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"18\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"19\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"20\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"21\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"22\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"23\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"24\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"25\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"26\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"27\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"28\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"29\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"30\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"31\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"32\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"33\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"34\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"35\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"36\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"37\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"38\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"39\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"40\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"41\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"42\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"43\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"44\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"MT\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m45084\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NSTDA_Pmon_1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"1407275500\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m4\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m4\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"common_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"black tiger shrimp\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"isolate\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SGIC_2016\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6687\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"133894\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Penaeus monodon\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6687\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"black tiger shrimp\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2394331783\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2020-11-05\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"10526977\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"618319281\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"18152621\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"7313578\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"8491987\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Dec 07, 2018\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Penaeus_vannamei/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_003789085.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Scaffold\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA438564\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Penaeus vannamei breed:Keihai No. 1 Genome sequencing and assembly\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m,\n", " \u001b[0;32m\"MT\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m86864\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"ASM378908v1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"1534800868\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m3\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"breed\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Kehai No.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"common_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Pacific white shrimp\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6689\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"merged_tax_ids\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"583111\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"133894\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Penaeus vannamei\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sex\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"male\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6689\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Pacific white shrimp\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"1663565311\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2018-11-16\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"annotation_metadata\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"file\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6482984\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"249136834\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GBFF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"15862279\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"RNA_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"6535574\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PROT_FASTA\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"5624279\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GENOME_GTF\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI Annotation Release 100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Sep 13, 2016\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"release_number\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"100\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"report_url\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Hyalella_azteca/100/\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"source\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NCBI\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"GCF_000764305.1\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_category\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"representative genome\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"assembly_level\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Scaffold\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"bioproject_lineages\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"bioprojects\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA243935\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA163973\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Hyalella azteca isolate:HAZT.00-mixed Genome sequencing and assembly\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA163973\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_accessions\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"PRJNA163993\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"i5k Arthropod Genome Pilot Project\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"accession\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"PRJNA163993\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"i5k initiative\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"chromosomes\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[0;32m\"Un\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"contig_n50\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m114415\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"display_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Hazt_2.0\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"estimated_size\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"442600481\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"org\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"assembly_counts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"node\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"subtree\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m2\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"isolate\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"HAZT.00-mixed\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"294128\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"parent_tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"199487\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"SPECIES\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"sci_name\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Hyalella azteca\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"tax_id\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"294128\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"title\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"Hyalella azteca\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"seq_length\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"550885727\"\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"submission_date\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"2016-07-20\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"total_count\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m6\u001b[0m\u001b[1;39m\n", "\u001b[1;39m}\u001b[0m\n" ] } ], "source": [ "!./datasets summary genome taxon Crustacea --refseq | ./jq ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you just want to get the count of available RefSeq (GCF) genomes that fall under a particular tax name, use the --refseq flag and set --limit to NONE:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "{\"total_count\":6}\n" ] } ], "source": [ "!./datasets summary genome taxon crustacea --refseq --limit NONE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading genome assembly sequence and metadata \n", "In this section, we'll show you how to download a genome data package for one of the Crustacean genomes using the datasets download genome command. Genome data packages can be retrieved in four ways \n", "\n", "- accession: an NCBI Assembly accession\n", "- organism: an organism or a taxonomical group name\n", "- taxid: using an NCBI Taxonomy identifier, at any level.\n", "- BioProject: using an NCBI BioProject accession\n", "\n", "The default genome data package includes the following data (when available):\n", "\n", "- genomic sequence (genomic.fna)\n", "- transcript sequences (rna.fna)\n", "- protein sequences (protein.faa)\n", "- annotation in gff3 format (genomic.gff)\n", "- a data report containing genome assembly and annotation metadata (assembly_data_report.jsonl)\n", "- a sequence report listing the nucleotide sequences that comprise the genome assembly (sequence_report.jsonl)\n", "\n", "In this example, we'll download the Datasets genome package for the Penaeus vannamei reference genome. For the purposes of this demonstration, we will redirect all messages from the datasets command to datasets.log." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloaded:\n901M\tpacific_white_shrimp.zip" ] } ], "source": [ "!./datasets download genome taxon \"penaeus vannamei\" --filename pacific_white_shrimp.zip >datasets.log 2>&1\n", "!printf \"Downloaded:\\n%s\" \"$(du --human-readable pacific_white_shrimp.zip)\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converting the Datasets assembly data report to tabular format\n", "The Datasets genome assembly data report can be converted to tabular format using the dataformat tool. In this step, we'll use the help command to view the data fields available for conversion " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\nConvert Genome Assembly Data Report into TSV format.\n\nRefer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.\n\nUsage\n dataformat tsv genome [flags]\n\nExamples\n dataformat tsv genome --inputfile human/ncbi_dataset/data/assembly_data_report.jsonl\n dataformat tsv genome --package human.zip\n\nFlags\n --fields strings comma-separated list of fields\n - annotinfo-featcount-gene-non-coding\n - annotinfo-featcount-gene-other\n - annotinfo-featcount-gene-protein-coding\n - annotinfo-featcount-gene-pseudogene\n - annotinfo-featcount-gene-total\n - annotinfo-name\n - annotinfo-release-date\n - annotinfo-report-url\n - annotinfo-source\n - assminfo-bioproject-lineage-accession\n - assminfo-bioproject-lineage-parent-accession\n - assminfo-bioproject-lineage-parent-accessions\n - assminfo-bioproject-lineage-title\n - assminfo-biosample-accession\n - assminfo-description\n - assminfo-genbank-assm-accession\n - assminfo-level\n - assminfo-linked-assm\n - assminfo-name\n - assminfo-refseq-assm-accession\n - assminfo-refseq-category\n - assminfo-sequencing-tech\n - assminfo-submission-date\n - assminfo-submitter\n - assminfo-type\n - assminfo-ucsc-assm-name\n - assmstats-contig-l50\n - assmstats-contig-n50\n - assmstats-gaps-between-scaffolds-count\n - assmstats-number-of-component-sequences\n - assmstats-number-of-contigs\n - assmstats-number-of-scaffolds\n - assmstats-scaffold-l50\n - assmstats-scaffold-n50\n - assmstats-total-number-of-chromosomes\n - assmstats-total-sequence-len\n - assmstats-total-ungapped-len\n - breed\n - common-name\n - cultivar\n - ecotype\n - isolate\n - organelle-assembly-name\n - organelle-bioproject-accessions\n - organelle-description\n - organelle-infraspecific-name\n - organelle-submitter\n - organelle-total-seq-length\n - organism-name\n - sex\n - strain\n - tax-id\n - wgs-contigs-url\n - wgs-project-accession\n - wgs-url\n -h, --help help for genome\n --inputfile string input file\n --package string datasets package (zip archive), inputfile parameter is relative to the root path inside the archive\n\n\n\nGlobal Flags\n --elide-header Do not output header\n\n" ] } ], "source": [ "!./dataformat tsv genome --help" ] }, { "source": [ "Let's look at the catalog inside the package, converting this JSON into an easy-to-read table." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\"GCA_003730335.1/GCA_003730335.1_ASM373033v1_genomic.fna\",\"GENOMIC_NUCLEOTIDE_FASTA\"\n\"GCA_003730335.1/sequence_report.jsonl\",\"SEQUENCE_REPORT\"\n\"GCA_003789085.1/GCA_003789085.1_ASM378908v1_genomic.fna\",\"GENOMIC_NUCLEOTIDE_FASTA\"\n\"GCA_003789085.1/genomic.gff\",\"GFF3\"\n\"GCA_003789085.1/protein.faa\",\"PROTEIN_FASTA\"\n\"GCA_003789085.1/sequence_report.jsonl\",\"SEQUENCE_REPORT\"\n\"GCF_003789085.1/GCF_003789085.1_ASM378908v1_genomic.fna\",\"GENOMIC_NUCLEOTIDE_FASTA\"\n\"GCF_003789085.1/genomic.gff\",\"GFF3\"\n\"GCF_003789085.1/protein.faa\",\"PROTEIN_FASTA\"\n\"GCF_003789085.1/rna.fna\",\"RNA_NUCLEOTIDE_FASTA\"\n\"GCF_003789085.1/sequence_report.jsonl\",\"SEQUENCE_REPORT\"\n\"assembly_data_report.jsonl\",\"DATA_REPORT\"\n" ] } ], "source": [ "!./dataformat catalog --package pacific_white_shrimp.zip 2>/dev/null | ./jq -r '.assemblies[] | .files[] | [.filePath, .fileType] | @csv'" ] }, { "source": [ "Now we'll use the dataformat tool to convert a default set of data fields into tsv format." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Assembly Name\tAssembly RefSeq Accession\tAssembly GenBank Accession\tAssembly Refseq Dategory\tAssembly Stats Number of Contigs\tAssembly Stats Number of Scaffolds\nASM373033v1\tna\tGCA_003730335.1\tna\t19584\t19584\nASM378908v1\tGCF_003789085.1\tGCA_003789085.1\trepresentative genome\t33019\t4682\nASM378908v1\tGCF_003789085.1\tGCA_003789085.1\trepresentative genome\t33019\t4682\n" ] } ], "source": [ "!./dataformat tsv genome --package pacific_white_shrimp.zip --fields assminfo-name,assminfo-refseq-assm-accession,assminfo-genbank-assm-accession,assminfo-refseq-category,assmstats-number-of-contigs,assmstats-number-of-scaffolds" ] }, { "source": [ "Next, we can list the first 30 FASTA deflines for the ASM378908v1 RefSeq assembly:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ ">NW_020868286.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1, whole genome shotgun sequence\n", ">NW_020868287.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_10, whole genome shotgun sequence\n", ">NW_020868288.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_100, whole genome shotgun sequence\n", ">NW_020868289.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1000, whole genome shotgun sequence\n", ">NW_020868290.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1001, whole genome shotgun sequence\n", ">NW_020868291.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1002, whole genome shotgun sequence\n", ">NW_020868292.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1003, whole genome shotgun sequence\n", ">NW_020868293.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1004, whole genome shotgun sequence\n", ">NW_020868294.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1005, whole genome shotgun sequence\n", ">NW_020868295.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1006, whole genome shotgun sequence\n", ">NW_020868296.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1007, whole genome shotgun sequence\n", ">NW_020868297.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1008, whole genome shotgun sequence\n", ">NW_020868298.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1009, whole genome shotgun sequence\n", ">NW_020868299.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_101, whole genome shotgun sequence\n", ">NW_020868300.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1010, whole genome shotgun sequence\n", ">NW_020868301.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1011, whole genome shotgun sequence\n", ">NW_020868302.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1012, whole genome shotgun sequence\n", ">NW_020868303.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1013, whole genome shotgun sequence\n", ">NW_020868304.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1014, whole genome shotgun sequence\n", ">NW_020868305.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1015, whole genome shotgun sequence\n", ">NW_020868306.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1016, whole genome shotgun sequence\n", ">NW_020868307.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1017, whole genome shotgun sequence\n", ">NW_020868308.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1018, whole genome shotgun sequence\n", ">NW_020868309.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1019, whole genome shotgun sequence\n", ">NW_020868310.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_102, whole genome shotgun sequence\n", ">NW_020868311.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1020, whole genome shotgun sequence\n", ">NW_020868312.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1021, whole genome shotgun sequence\n", ">NW_020868313.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1022, whole genome shotgun sequence\n", ">NW_020868314.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1023, whole genome shotgun sequence\n", ">NW_020868315.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1024, whole genome shotgun sequence\n" ] } ], "source": [ "!unzip -q -c pacific_white_shrimp.zip ncbi_dataset/data/GCF_003789085.1/GCF_003789085.1_ASM378908v1_genomic.fna | grep --max-count=30 '^>'" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1-final" } }, "nbformat": 4, "nbformat_minor": 4 }