{ "cells": [ { "cell_type": "markdown", "id": "f791b70f", "metadata": {}, "source": [ "# oggmap: Step 1 - get taxonomic information\n", "\n", "This notebook will demonstrate how to get taxonomic information for your query species with `oggmap`.\n", "\n", "Given a species name or taxonomic ID, the query species lineage information is in `oggmap` version `v0.0.1` extracted with the help of the `ete3` python toolkit and the `NCBI taxonomy` ([Huerta-Cepas et al., 2016](https://doi.org/10.1093/molbev/msw046)). In `oggmap` version `v0.0.2` the taxonomic information is ectracted with `taxadb2` (see here for more information [taxadb2](https://pypi.org/project/taxadb2/)). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison." ] }, { "cell_type": "markdown", "id": "b722f9d1", "metadata": {}, "source": [ "__Note:__ If you need to download or update the NCBI taxonomy database via the `ete3` python package and `oggmap` version `v0.0.1`. Please use the `oggmap` command line function [ncbitax](https://oggmap.readthedocs.io/en/latest/tutorials/commandline.ncbitax.html) or run the following code:" ] }, { "cell_type": "raw", "id": "07acee84", "metadata": {}, "source": [ "# command line\n", "oggmap ncbitax -u\n", "# import submodule\n", "from oggmap import ncbitax\n", "ncbitax.update_ncbi()" ] }, { "cell_type": "markdown", "id": "6427785c", "metadata": {}, "source": [ "__Note:__ If you need to download or update the NCBI taxonomy database via the `taxadb2` python package and `oggmap` version `v0.0.2`. Please use the `oggmap` command line function [ncbitax](https://oggmap.readthedocs.io/en/latest/tutorials/commandline.ncbitax.html) or run the following code:" ] }, { "cell_type": "raw", "id": "88c15ebd", "metadata": {}, "source": [ "# command line\n", "oggmap ncbitax -u -outdir taxadb -t taxa -dbname taxadb.sqlite \n", "# import submodule\n", "import sys\n", "from oggmap import ncbitax\n", "outdir = 'taxadb'\n", "dbname = 'taxadb.sqlite'\n", "sys.argv = ['ncbitax', '-u', '-outdir', outdir, '-t', 'taxa', '-dbname', dbname]\n", "update_parser = ncbitax.define_parser()\n", "update_args, unknown_args = update_parser.parse_known_args()\n", "ncbitax.update_ncbi(update_args)" ] }, { "cell_type": "markdown", "id": "3087d74b", "metadata": {}, "source": [ "## Notebook file\n", "\n", "Notebook file can be obtained here:\n", "\n", "[https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/query_lineage.ipynb](https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/query_lineage.ipynb)" ] }, { "cell_type": "markdown", "id": "4aeb52cf", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "c437db69", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from statannot import add_stat_annotation\n", "# increase dpi\n", "%matplotlib inline\n", "#plt.rcParams['figure.dpi'] = 300\n", "#plt.rcParams['savefig.dpi'] = 300\n", "plt.rcParams['figure.figsize'] = [6, 4.5]\n", "#plt.rcParams['figure.figsize'] = [4.4, 3.3]" ] }, { "cell_type": "markdown", "id": "3b5c6174", "metadata": {}, "source": [ "## Import oggmap python package submodules" ] }, { "cell_type": "code", "execution_count": 2, "id": "c8e631aa", "metadata": {}, "outputs": [], "source": [ "# import submodules\n", "from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets, ncbitax" ] }, { "cell_type": "markdown", "id": "b49a7a9d", "metadata": {}, "source": [ "## Get query species taxonomic lineage information\n", "\n", "The `oggmap` submodule `qlin` helps to get taxonomic information for you with the `qlin.get_qlin()` function as follows:" ] }, { "cell_type": "code", "execution_count": 3, "id": "cb83ffa1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query name: Caenorhabditis elegans\n", "query taxID: 6239\n", "query kingdom: Eukaryota\n", "query lineage names: \n", "['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)', 'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)', 'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)', 'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']\n", "query lineage: \n", "[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236, 2301116, 2301119, 55879, 6243, 55885, 6237, 6239]\n" ] } ], "source": [ "# get query species taxonomic lineage information\n", "query_lineage = qlin.get_qlin(q='Caenorhabditis elegans', dbname='taxadb.sqlite')" ] }, { "cell_type": "markdown", "id": "44618b4b", "metadata": {}, "source": [ "The `query_lineage` variable now contains the following information in a list:\n", "- query name `query_lineage[0]`\n", "- query taxID `query_lineage[1]`\n", "- query lineage `query_lineage[2]`\n", "- query lineage dictionary `query_lineage[3]`\n", "- query lineage zip `query_lineage[4]`\n", "- query lineage names `query_lineage[5]`\n", "- reverse query lineage `query_lineage[6]`\n", "- query kingdom `query_lineage[7]`" ] }, { "cell_type": "code", "execution_count": 4, "id": "bc18d648", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Caenorhabditis elegans'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query name\n", "query_lineage[0]" ] }, { "cell_type": "code", "execution_count": 5, "id": "81911f0d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6239" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query taxID\n", "query_lineage[1]" ] }, { "cell_type": "code", "execution_count": 6, "id": "584d95f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1,\n", " 131567,\n", " 2759,\n", " 33154,\n", " 33208,\n", " 6072,\n", " 33213,\n", " 33317,\n", " 1206794,\n", " 6231,\n", " 119089,\n", " 6236,\n", " 2301116,\n", " 2301119,\n", " 55879,\n", " 6243,\n", " 55885,\n", " 6237,\n", " 6239]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage\n", "query_lineage[2]" ] }, { "cell_type": "code", "execution_count": 7, "id": "53a9e6aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: 'root',\n", " 131567: 'cellular organisms',\n", " 2759: 'Eukaryota',\n", " 33154: 'Opisthokonta',\n", " 33208: 'Metazoa',\n", " 6072: 'Eumetazoa',\n", " 33213: 'Bilateria',\n", " 33317: 'Protostomia',\n", " 1206794: 'Ecdysozoa',\n", " 6231: 'Nematoda',\n", " 119089: 'Chromadorea',\n", " 6236: 'Rhabditida',\n", " 2301116: 'Rhabditina',\n", " 2301119: 'Rhabditomorpha',\n", " 55879: 'Rhabditoidea',\n", " 6243: 'Rhabditidae',\n", " 55885: 'Peloderinae',\n", " 6237: 'Caenorhabditis',\n", " 6239: 'Caenorhabditis elegans'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage dictionary\n", "query_lineage[3]" ] }, { "cell_type": "code", "execution_count": 8, "id": "70f24083", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, 'root'),\n", " (131567, 'cellular organisms'),\n", " (2759, 'Eukaryota'),\n", " (33154, 'Opisthokonta'),\n", " (33208, 'Metazoa'),\n", " (6072, 'Eumetazoa'),\n", " (33213, 'Bilateria'),\n", " (33317, 'Protostomia'),\n", " (1206794, 'Ecdysozoa'),\n", " (6231, 'Nematoda'),\n", " (119089, 'Chromadorea'),\n", " (6236, 'Rhabditida'),\n", " (2301116, 'Rhabditina'),\n", " (2301119, 'Rhabditomorpha'),\n", " (55879, 'Rhabditoidea'),\n", " (6243, 'Rhabditidae'),\n", " (55885, 'Peloderinae'),\n", " (6237, 'Caenorhabditis'),\n", " (6239, 'Caenorhabditis elegans')]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage zip\n", "query_lineage[4]" ] }, { "cell_type": "code", "execution_count": 9, "id": "6c911c5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | PSnum | \n", "PStaxID | \n", "PSname | \n", "
|---|---|---|---|
| 0 | \n", "0 | \n", "1 | \n", "root | \n", "
| 1 | \n", "1 | \n", "131567 | \n", "cellular organisms | \n", "
| 2 | \n", "2 | \n", "2759 | \n", "Eukaryota | \n", "
| 3 | \n", "3 | \n", "33154 | \n", "Opisthokonta | \n", "
| 4 | \n", "4 | \n", "33208 | \n", "Metazoa | \n", "
| 5 | \n", "5 | \n", "6072 | \n", "Eumetazoa | \n", "
| 6 | \n", "6 | \n", "33213 | \n", "Bilateria | \n", "
| 7 | \n", "7 | \n", "33317 | \n", "Protostomia | \n", "
| 8 | \n", "8 | \n", "1206794 | \n", "Ecdysozoa | \n", "
| 9 | \n", "9 | \n", "6231 | \n", "Nematoda | \n", "
| 10 | \n", "10 | \n", "119089 | \n", "Chromadorea | \n", "
| 11 | \n", "11 | \n", "6236 | \n", "Rhabditida | \n", "
| 12 | \n", "12 | \n", "2301116 | \n", "Rhabditina | \n", "
| 13 | \n", "13 | \n", "2301119 | \n", "Rhabditomorpha | \n", "
| 14 | \n", "14 | \n", "55879 | \n", "Rhabditoidea | \n", "
| 15 | \n", "15 | \n", "6243 | \n", "Rhabditidae | \n", "
| 16 | \n", "16 | \n", "55885 | \n", "Peloderinae | \n", "
| 17 | \n", "17 | \n", "6237 | \n", "Caenorhabditis | \n", "
| 18 | \n", "18 | \n", "6239 | \n", "Caenorhabditis elegans | \n", "