{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "SRA_analysis",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ipPtJp7-b310"
},
"source": [
"# Obtaining SRA metadata for SARS-CoV2 \n",
"------\n",
"Here we selecting \"best\" datasets for reanalysis using best-practice Galaxy SARS-CoV2 workflows. The first step is to go to https://www.ncbi.nlm.nih.gov/sra and perform a query with the following search terms: `txid2697049[Organism:noexp]`.\n",
"\n",
"Next, download serach results using `Send to:` menu selecting `File` qns then `RunInfo`. The resulting csv file is loaded into pandas below. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "IcOEx6BKK2CQ"
},
"source": [
"!pip3 install datapane"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Zd_rF6Oi_J2r"
},
"source": [
"# Set your datapane.com token here\n",
"import datapane as dp\n",
"dp.login(token=\"xxxxx\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "lax0gNVqM8pq"
},
"source": [
"import pandas as pd\n",
"pd.set_option('display.max_rows', 500)\n",
"pd.set_option('display.max_columns', 500)\n",
"pd.set_option('display.width', 1000)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "txFIYbuNN-GT",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "90aa70a3-3ea9-4a2f-d206-34b06e1148ae"
},
"source": [
"pip install -U pandasql"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already up-to-date: pandasql in /usr/local/lib/python3.6/dist-packages (0.7.3)\n",
"Requirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.19.5)\n",
"Requirement already satisfied, skipping upgrade: pandas in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.1.5)\n",
"Requirement already satisfied, skipping upgrade: sqlalchemy in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.3.23)\n",
"Requirement already satisfied, skipping upgrade: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->pandasql) (2018.9)\n",
"Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas->pandasql) (2.8.1)\n",
"Requirement already satisfied, skipping upgrade: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.7.3->pandas->pandasql) (1.15.0)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "zgjRl2Iw9tJu"
},
"source": [
"from pandasql import sqldf\n",
"pysqldf = lambda q: sqldf(q, globals())"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "_NcO_3fu1a97"
},
"source": [
"## Processing NCBI metadata\n",
"\n",
"The metedata is obtained directly from SRA website by selecting all SRA datasets for `txid` `2697049` and saving the results as `RunInfo` table, compressing it, and uploading to this notebook."
]
},
{
"cell_type": "code",
"metadata": {
"id": "7cqmUaLQAvKO"
},
"source": [
"ncbi = pd.read_csv('https://github.com/galaxyproject/SARS-CoV-2/raw/master/data/var/SRA_Jan20_2021.csv.gz')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "HJL0oX7kmkoT",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ac7c1fa7-8c1c-43fe-fce1-88808222ce69"
},
"source": [
"print(ncbi.columns)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Index(['Run', 'ReleaseDate', 'LoadDate', 'spots', 'bases', 'spots_with_mates', 'avgLength', 'size_MB', 'AssemblyName', 'download_path', 'Experiment', 'LibraryName', 'LibraryStrategy', 'LibrarySelection', 'LibrarySource', 'LibraryLayout', 'InsertSize', 'InsertDev', 'Platform', 'Model', 'SRAStudy', 'BioProject', 'Study_Pubmed_id', 'ProjectID', 'Sample', 'BioSample', 'SampleType', 'TaxID', 'ScientificName', 'SampleName', 'g1k_pop_code', 'source', 'g1k_analysis_group', 'Subject_ID', 'Sex', 'Disease', 'Tumor', 'Affection_Status', 'Analyte_Type', 'Histological_Type', 'Body_Site', 'CenterName', 'Submission', 'dbgap_study_accession', 'Consent', 'RunHash', 'ReadHash'], dtype='object')\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "r9-AZsIOrdA7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "cbf72a46-003a-4a07-d2b0-38fae610005d"
},
"source": [
"pysqldf('select count(distinct BioProject) from ncbi')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count(distinct BioProject) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 192 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count(distinct BioProject)\n",
"0 192"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "4P98Fgojr60q",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "f8a257aa-a552-4a1e-a4f2-f62e2fb3157f"
},
"source": [
"pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\"')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count(distinct BioProject) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 149 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count(distinct BioProject)\n",
"0 149"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "8JFDv0QgsG-i",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "9c4b88b1-4fd2-499a-ce6e-6f0262dc39e7"
},
"source": [
"pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\"')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count(distinct BioProject) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 33 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count(distinct BioProject)\n",
"0 33"
]
},
"metadata": {
"tags": []
},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "nF8XjvAbs3Sv",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "c6578f49-a0ef-4297-d31b-d44710509e92"
},
"source": [
"pysqldf('select count(*) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\"')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count(*) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3351 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count(*)\n",
"0 3351"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "XhFOvlI9sprE",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"outputId": "c8de0270-ff0e-4e0b-8a78-4a0b7a89d1ec"
},
"source": [
"pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\"')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count(distinct BioProject) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 31 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count(distinct BioProject)\n",
"0 31"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1RWPQfg-tUot",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "2c2b46f3-7a97-4e9a-f7ee-37211cb8da99"
},
"source": [
"pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\" group by BioProject order by N desc')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" ReleaseDate | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJNA622837 | \n",
" 2020-06-08 14:49:42 | \n",
" 1564 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJNA612578 | \n",
" 2020-03-17 01:31:51 | \n",
" 964 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA650245 | \n",
" 2020-08-19 16:26:12 | \n",
" 617 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA610428 | \n",
" 2020-06-06 00:29:31 | \n",
" 42 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJEB38546 | \n",
" 2020-10-17 18:53:39 | \n",
" 26 | \n",
"
\n",
" \n",
" 5 | \n",
" PRJNA634356 | \n",
" 2020-09-27 17:31:30 | \n",
" 25 | \n",
"
\n",
" \n",
" 6 | \n",
" PRJNA650134 | \n",
" 2020-08-01 14:16:35 | \n",
" 22 | \n",
"
\n",
" \n",
" 7 | \n",
" PRJNA661544 | \n",
" 2020-11-19 20:02:39 | \n",
" 15 | \n",
"
\n",
" \n",
" 8 | \n",
" PRJNA638211 | \n",
" 2020-07-31 11:46:07 | \n",
" 10 | \n",
"
\n",
" \n",
" 9 | \n",
" PRJNA605983 | \n",
" 2020-02-15 11:40:11 | \n",
" 9 | \n",
"
\n",
" \n",
" 10 | \n",
" PRJNA605907 | \n",
" 2020-02-22 21:16:12 | \n",
" 8 | \n",
"
\n",
" \n",
" 11 | \n",
" PRJNA616446 | \n",
" 2020-04-02 00:08:41 | \n",
" 7 | \n",
"
\n",
" \n",
" 12 | \n",
" PRJNA615319 | \n",
" 2020-04-01 02:23:15 | \n",
" 6 | \n",
"
\n",
" \n",
" 13 | \n",
" PRJNA639591 | \n",
" 2020-07-07 00:15:17 | \n",
" 5 | \n",
"
\n",
" \n",
" 14 | \n",
" PRJNA673055 | \n",
" 2020-11-10 23:37:43 | \n",
" 5 | \n",
"
\n",
" \n",
" 15 | \n",
" PRJNA634194 | \n",
" 2020-05-20 22:28:37 | \n",
" 4 | \n",
"
\n",
" \n",
" 16 | \n",
" PRJNA623001 | \n",
" 2020-12-06 18:25:32 | \n",
" 3 | \n",
"
\n",
" \n",
" 17 | \n",
" PRJNA644357 | \n",
" 2020-09-01 17:09:18 | \n",
" 3 | \n",
"
\n",
" \n",
" 18 | \n",
" PRJNA636446 | \n",
" 2020-06-01 16:48:32 | \n",
" 2 | \n",
"
\n",
" \n",
" 19 | \n",
" PRJNA643574 | \n",
" 2020-07-02 07:59:17 | \n",
" 2 | \n",
"
\n",
" \n",
" 20 | \n",
" PRJNA669553 | \n",
" 2020-10-19 12:17:41 | \n",
" 2 | \n",
"
\n",
" \n",
" 21 | \n",
" PRJEB38459 | \n",
" 2020-07-24 15:14:14 | \n",
" 1 | \n",
"
\n",
" \n",
" 22 | \n",
" PRJEB39737 | \n",
" 2020-08-12 08:55:30 | \n",
" 1 | \n",
"
\n",
" \n",
" 23 | \n",
" PRJEB41216 | \n",
" 2020-11-13 18:40:39 | \n",
" 1 | \n",
"
\n",
" \n",
" 24 | \n",
" PRJNA608651 | \n",
" 2020-02-25 09:00:40 | \n",
" 1 | \n",
"
\n",
" \n",
" 25 | \n",
" PRJNA623895 | \n",
" 2020-04-10 09:13:31 | \n",
" 1 | \n",
"
\n",
" \n",
" 26 | \n",
" PRJNA624231 | \n",
" 2020-05-06 17:28:09 | \n",
" 1 | \n",
"
\n",
" \n",
" 27 | \n",
" PRJNA625669 | \n",
" 2020-04-17 01:08:31 | \n",
" 1 | \n",
"
\n",
" \n",
" 28 | \n",
" PRJNA630716 | \n",
" 2020-09-09 19:43:23 | \n",
" 1 | \n",
"
\n",
" \n",
" 29 | \n",
" PRJNA658242 | \n",
" 2020-08-20 13:10:27 | \n",
" 1 | \n",
"
\n",
" \n",
" 30 | \n",
" PRJNA689000 | \n",
" 2021-01-01 16:17:28 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject ReleaseDate N\n",
"0 PRJNA622837 2020-06-08 14:49:42 1564\n",
"1 PRJNA612578 2020-03-17 01:31:51 964\n",
"2 PRJNA650245 2020-08-19 16:26:12 617\n",
"3 PRJNA610428 2020-06-06 00:29:31 42\n",
"4 PRJEB38546 2020-10-17 18:53:39 26\n",
"5 PRJNA634356 2020-09-27 17:31:30 25\n",
"6 PRJNA650134 2020-08-01 14:16:35 22\n",
"7 PRJNA661544 2020-11-19 20:02:39 15\n",
"8 PRJNA638211 2020-07-31 11:46:07 10\n",
"9 PRJNA605983 2020-02-15 11:40:11 9\n",
"10 PRJNA605907 2020-02-22 21:16:12 8\n",
"11 PRJNA616446 2020-04-02 00:08:41 7\n",
"12 PRJNA615319 2020-04-01 02:23:15 6\n",
"13 PRJNA639591 2020-07-07 00:15:17 5\n",
"14 PRJNA673055 2020-11-10 23:37:43 5\n",
"15 PRJNA634194 2020-05-20 22:28:37 4\n",
"16 PRJNA623001 2020-12-06 18:25:32 3\n",
"17 PRJNA644357 2020-09-01 17:09:18 3\n",
"18 PRJNA636446 2020-06-01 16:48:32 2\n",
"19 PRJNA643574 2020-07-02 07:59:17 2\n",
"20 PRJNA669553 2020-10-19 12:17:41 2\n",
"21 PRJEB38459 2020-07-24 15:14:14 1\n",
"22 PRJEB39737 2020-08-12 08:55:30 1\n",
"23 PRJEB41216 2020-11-13 18:40:39 1\n",
"24 PRJNA608651 2020-02-25 09:00:40 1\n",
"25 PRJNA623895 2020-04-10 09:13:31 1\n",
"26 PRJNA624231 2020-05-06 17:28:09 1\n",
"27 PRJNA625669 2020-04-17 01:08:31 1\n",
"28 PRJNA630716 2020-09-09 19:43:23 1\n",
"29 PRJNA658242 2020-08-20 13:10:27 1\n",
"30 PRJNA689000 2021-01-01 16:17:28 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "zgypVKryKqsJ",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "46834d00-5e8e-4817-a9a5-4d080d71b64f"
},
"source": [
"pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"AMPLICON\" and LibraryLayout=\"PAIRED\" group by BioProject order by ReleaseDate,N desc')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" ReleaseDate | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJNA614546 | \n",
" 2020-03-23 22:30:33 | \n",
" 24 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJNA613958 | \n",
" 2020-03-24 02:56:38 | \n",
" 14860 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA614995 | \n",
" 2020-03-24 19:53:13 | \n",
" 3967 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA622817 | \n",
" 2020-04-05 03:54:10 | \n",
" 18 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJNA623683 | \n",
" 2020-04-09 04:58:14 | \n",
" 1 | \n",
"
\n",
" \n",
" 5 | \n",
" PRJNA616147 | \n",
" 2020-04-14 21:13:31 | \n",
" 4 | \n",
"
\n",
" \n",
" 6 | \n",
" PRJNA625551 | \n",
" 2020-04-17 16:26:15 | \n",
" 1163 | \n",
"
\n",
" \n",
" 7 | \n",
" PRJNA627229 | \n",
" 2020-04-23 05:01:42 | \n",
" 11 | \n",
"
\n",
" \n",
" 8 | \n",
" PRJEB37886 | \n",
" 2020-05-01 12:26:32 | \n",
" 104984 | \n",
"
\n",
" \n",
" 9 | \n",
" PRJNA629891 | \n",
" 2020-05-02 02:12:35 | \n",
" 2 | \n",
"
\n",
" \n",
" 10 | \n",
" PRJNA631042 | \n",
" 2020-05-12 12:50:25 | \n",
" 37 | \n",
"
\n",
" \n",
" 11 | \n",
" PRJNA627662 | \n",
" 2020-05-13 11:13:04 | \n",
" 112 | \n",
"
\n",
" \n",
" 12 | \n",
" PRJNA633948 | \n",
" 2020-05-24 21:40:17 | \n",
" 204 | \n",
"
\n",
" \n",
" 13 | \n",
" PRJNA629889 | \n",
" 2020-05-27 19:14:35 | \n",
" 18 | \n",
"
\n",
" \n",
" 14 | \n",
" PRJNA636446 | \n",
" 2020-06-01 16:50:40 | \n",
" 7 | \n",
"
\n",
" \n",
" 15 | \n",
" PRJNA634119 | \n",
" 2020-06-15 21:00:36 | \n",
" 49 | \n",
"
\n",
" \n",
" 16 | \n",
" PRJEB38369 | \n",
" 2020-06-17 19:16:32 | \n",
" 18 | \n",
"
\n",
" \n",
" 17 | \n",
" PRJEB38723 | \n",
" 2020-06-21 15:16:39 | \n",
" 542 | \n",
"
\n",
" \n",
" 18 | \n",
" PRJNA639066 | \n",
" 2020-06-26 11:25:40 | \n",
" 1931 | \n",
"
\n",
" \n",
" 19 | \n",
" PRJNA636748 | \n",
" 2020-06-29 14:57:15 | \n",
" 516 | \n",
"
\n",
" \n",
" 20 | \n",
" PRJNA643575 | \n",
" 2020-07-07 08:37:40 | \n",
" 85 | \n",
"
\n",
" \n",
" 21 | \n",
" PRJNA647529 | \n",
" 2020-07-21 04:20:18 | \n",
" 212 | \n",
"
\n",
" \n",
" 22 | \n",
" PRJNA647448 | \n",
" 2020-07-21 12:31:35 | \n",
" 7 | \n",
"
\n",
" \n",
" 23 | \n",
" PRJNA649101 | \n",
" 2020-07-30 07:49:28 | \n",
" 24 | \n",
"
\n",
" \n",
" 24 | \n",
" PRJNA645906 | \n",
" 2020-07-30 10:24:28 | \n",
" 2286 | \n",
"
\n",
" \n",
" 25 | \n",
" PRJNA656534 | \n",
" 2020-08-11 16:24:59 | \n",
" 567 | \n",
"
\n",
" \n",
" 26 | \n",
" PRJNA656695 | \n",
" 2020-08-12 08:07:18 | \n",
" 171 | \n",
"
\n",
" \n",
" 27 | \n",
" PRJNA610428 | \n",
" 2020-08-18 20:43:08 | \n",
" 20 | \n",
"
\n",
" \n",
" 28 | \n",
" PRJNA650037 | \n",
" 2020-08-20 13:10:22 | \n",
" 60 | \n",
"
\n",
" \n",
" 29 | \n",
" PRJNA662589 | \n",
" 2020-09-23 00:00:28 | \n",
" 9 | \n",
"
\n",
" \n",
" 30 | \n",
" PRJEB40443 | \n",
" 2020-09-24 13:54:32 | \n",
" 346 | \n",
"
\n",
" \n",
" 31 | \n",
" PRJNA666543 | \n",
" 2020-09-30 09:13:49 | \n",
" 16 | \n",
"
\n",
" \n",
" 32 | \n",
" PRJEB39887 | \n",
" 2020-10-05 09:34:44 | \n",
" 468 | \n",
"
\n",
" \n",
" 33 | \n",
" PRJEB38546 | \n",
" 2020-10-17 18:53:39 | \n",
" 26 | \n",
"
\n",
" \n",
" 34 | \n",
" PRJNA670222 | \n",
" 2020-10-20 16:37:16 | \n",
" 2 | \n",
"
\n",
" \n",
" 35 | \n",
" PRJEB40394 | \n",
" 2020-10-22 12:06:40 | \n",
" 48 | \n",
"
\n",
" \n",
" 36 | \n",
" PRJEB39849 | \n",
" 2020-10-30 14:40:23 | \n",
" 420 | \n",
"
\n",
" \n",
" 37 | \n",
" PRJNA673096 | \n",
" 2020-11-01 13:09:09 | \n",
" 246 | \n",
"
\n",
" \n",
" 38 | \n",
" PRJNA673341 | \n",
" 2020-11-01 16:14:27 | \n",
" 7 | \n",
"
\n",
" \n",
" 39 | \n",
" PRJEB40188 | \n",
" 2020-11-03 16:12:51 | \n",
" 80 | \n",
"
\n",
" \n",
" 40 | \n",
" PRJNA679460 | \n",
" 2020-11-19 13:16:02 | \n",
" 208 | \n",
"
\n",
" \n",
" 41 | \n",
" PRJNA681234 | \n",
" 2020-11-28 02:21:13 | \n",
" 9 | \n",
"
\n",
" \n",
" 42 | \n",
" PRJNA681574 | \n",
" 2020-11-30 14:53:35 | \n",
" 151 | \n",
"
\n",
" \n",
" 43 | \n",
" PRJNA679980 | \n",
" 2020-12-07 00:00:23 | \n",
" 6 | \n",
"
\n",
" \n",
" 44 | \n",
" PRJNA686083 | \n",
" 2020-12-20 07:35:42 | \n",
" 81 | \n",
"
\n",
" \n",
" 45 | \n",
" PRJEB42024 | \n",
" 2020-12-21 10:46:36 | \n",
" 539 | \n",
"
\n",
" \n",
" 46 | \n",
" PRJNA685400 | \n",
" 2020-12-22 05:15:23 | \n",
" 28 | \n",
"
\n",
" \n",
" 47 | \n",
" PRJNA682735 | \n",
" 2020-12-24 17:50:34 | \n",
" 315 | \n",
"
\n",
" \n",
" 48 | \n",
" PRJNA665485 | \n",
" 2020-12-31 00:07:20 | \n",
" 8 | \n",
"
\n",
" \n",
" 49 | \n",
" PRJNA669553 | \n",
" 2020-12-31 00:11:13 | \n",
" 122 | \n",
"
\n",
" \n",
" 50 | \n",
" PRJNA669862 | \n",
" 2021-01-01 04:16:20 | \n",
" 8 | \n",
"
\n",
" \n",
" 51 | \n",
" PRJNA689811 | \n",
" 2021-01-05 15:25:38 | \n",
" 1 | \n",
"
\n",
" \n",
" 52 | \n",
" PRJNA686984 | \n",
" 2021-01-06 19:44:09 | \n",
" 543 | \n",
"
\n",
" \n",
" 53 | \n",
" PRJNA628662 | \n",
" 2021-01-11 13:39:29 | \n",
" 4 | \n",
"
\n",
" \n",
" 54 | \n",
" PRJNA657032 | \n",
" 2021-01-12 00:10:32 | \n",
" 71 | \n",
"
\n",
" \n",
" 55 | \n",
" PRJNA692472 | \n",
" 2021-01-15 17:04:55 | \n",
" 47 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject ReleaseDate N\n",
"0 PRJNA614546 2020-03-23 22:30:33 24\n",
"1 PRJNA613958 2020-03-24 02:56:38 14860\n",
"2 PRJNA614995 2020-03-24 19:53:13 3967\n",
"3 PRJNA622817 2020-04-05 03:54:10 18\n",
"4 PRJNA623683 2020-04-09 04:58:14 1\n",
"5 PRJNA616147 2020-04-14 21:13:31 4\n",
"6 PRJNA625551 2020-04-17 16:26:15 1163\n",
"7 PRJNA627229 2020-04-23 05:01:42 11\n",
"8 PRJEB37886 2020-05-01 12:26:32 104984\n",
"9 PRJNA629891 2020-05-02 02:12:35 2\n",
"10 PRJNA631042 2020-05-12 12:50:25 37\n",
"11 PRJNA627662 2020-05-13 11:13:04 112\n",
"12 PRJNA633948 2020-05-24 21:40:17 204\n",
"13 PRJNA629889 2020-05-27 19:14:35 18\n",
"14 PRJNA636446 2020-06-01 16:50:40 7\n",
"15 PRJNA634119 2020-06-15 21:00:36 49\n",
"16 PRJEB38369 2020-06-17 19:16:32 18\n",
"17 PRJEB38723 2020-06-21 15:16:39 542\n",
"18 PRJNA639066 2020-06-26 11:25:40 1931\n",
"19 PRJNA636748 2020-06-29 14:57:15 516\n",
"20 PRJNA643575 2020-07-07 08:37:40 85\n",
"21 PRJNA647529 2020-07-21 04:20:18 212\n",
"22 PRJNA647448 2020-07-21 12:31:35 7\n",
"23 PRJNA649101 2020-07-30 07:49:28 24\n",
"24 PRJNA645906 2020-07-30 10:24:28 2286\n",
"25 PRJNA656534 2020-08-11 16:24:59 567\n",
"26 PRJNA656695 2020-08-12 08:07:18 171\n",
"27 PRJNA610428 2020-08-18 20:43:08 20\n",
"28 PRJNA650037 2020-08-20 13:10:22 60\n",
"29 PRJNA662589 2020-09-23 00:00:28 9\n",
"30 PRJEB40443 2020-09-24 13:54:32 346\n",
"31 PRJNA666543 2020-09-30 09:13:49 16\n",
"32 PRJEB39887 2020-10-05 09:34:44 468\n",
"33 PRJEB38546 2020-10-17 18:53:39 26\n",
"34 PRJNA670222 2020-10-20 16:37:16 2\n",
"35 PRJEB40394 2020-10-22 12:06:40 48\n",
"36 PRJEB39849 2020-10-30 14:40:23 420\n",
"37 PRJNA673096 2020-11-01 13:09:09 246\n",
"38 PRJNA673341 2020-11-01 16:14:27 7\n",
"39 PRJEB40188 2020-11-03 16:12:51 80\n",
"40 PRJNA679460 2020-11-19 13:16:02 208\n",
"41 PRJNA681234 2020-11-28 02:21:13 9\n",
"42 PRJNA681574 2020-11-30 14:53:35 151\n",
"43 PRJNA679980 2020-12-07 00:00:23 6\n",
"44 PRJNA686083 2020-12-20 07:35:42 81\n",
"45 PRJEB42024 2020-12-21 10:46:36 539\n",
"46 PRJNA685400 2020-12-22 05:15:23 28\n",
"47 PRJNA682735 2020-12-24 17:50:34 315\n",
"48 PRJNA665485 2020-12-31 00:07:20 8\n",
"49 PRJNA669553 2020-12-31 00:11:13 122\n",
"50 PRJNA669862 2021-01-01 04:16:20 8\n",
"51 PRJNA689811 2021-01-05 15:25:38 1\n",
"52 PRJNA686984 2021-01-06 19:44:09 543\n",
"53 PRJNA628662 2021-01-11 13:39:29 4\n",
"54 PRJNA657032 2021-01-12 00:10:32 71\n",
"55 PRJNA692472 2021-01-15 17:04:55 47"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "A6pJykchKqGB",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "bd30abee-27b0-4456-ff74-2a38846f6e7f"
},
"source": [
"pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"OXFORD_NANOPORE\" and LibraryStrategy=\"AMPLICON\" group by BioProject order by ReleaseDate,N desc')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" ReleaseDate | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJNA613958 | \n",
" 2020-03-24 02:56:38 | \n",
" 6 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJNA614995 | \n",
" 2020-03-24 20:18:34 | \n",
" 126 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA622817 | \n",
" 2020-04-05 04:02:53 | \n",
" 5 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA616147 | \n",
" 2020-04-14 23:47:18 | \n",
" 3 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJNA627229 | \n",
" 2020-04-23 05:01:42 | \n",
" 56 | \n",
"
\n",
" \n",
" 5 | \n",
" PRJNA610248 | \n",
" 2020-04-28 22:04:43 | \n",
" 162 | \n",
"
\n",
" \n",
" 6 | \n",
" PRJNA614976 | \n",
" 2020-04-29 20:43:51 | \n",
" 60 | \n",
"
\n",
" \n",
" 7 | \n",
" PRJEB37886 | \n",
" 2020-04-30 11:59:46 | \n",
" 20968 | \n",
"
\n",
" \n",
" 8 | \n",
" PRJNA632678 | \n",
" 2020-05-14 10:09:40 | \n",
" 1 | \n",
"
\n",
" \n",
" 9 | \n",
" PRJEB38388 | \n",
" 2020-05-21 12:11:53 | \n",
" 584 | \n",
"
\n",
" \n",
" 10 | \n",
" PRJEB37966 | \n",
" 2020-05-27 15:10:51 | \n",
" 123 | \n",
"
\n",
" \n",
" 11 | \n",
" PRJNA634965 | \n",
" 2020-06-01 00:08:21 | \n",
" 3 | \n",
"
\n",
" \n",
" 12 | \n",
" PRJNA628662 | \n",
" 2020-07-06 09:38:16 | \n",
" 72 | \n",
"
\n",
" \n",
" 13 | \n",
" PRJNA645718 | \n",
" 2020-07-13 12:02:15 | \n",
" 5 | \n",
"
\n",
" \n",
" 14 | \n",
" PRJNA645970 | \n",
" 2020-07-13 18:54:07 | \n",
" 173 | \n",
"
\n",
" \n",
" 15 | \n",
" PRJEB39487 | \n",
" 2020-07-23 18:43:53 | \n",
" 339 | \n",
"
\n",
" \n",
" 16 | \n",
" PRJEB38459 | \n",
" 2020-07-24 14:46:31 | \n",
" 1 | \n",
"
\n",
" \n",
" 17 | \n",
" PRJNA649101 | \n",
" 2020-07-30 07:49:27 | \n",
" 24 | \n",
"
\n",
" \n",
" 18 | \n",
" PRJNA640656 | \n",
" 2020-08-06 10:36:18 | \n",
" 88 | \n",
"
\n",
" \n",
" 19 | \n",
" PRJNA650037 | \n",
" 2020-08-13 12:33:55 | \n",
" 210 | \n",
"
\n",
" \n",
" 20 | \n",
" PRJNA658490 | \n",
" 2020-08-21 12:57:54 | \n",
" 1 | \n",
"
\n",
" \n",
" 21 | \n",
" PRJNA667434 | \n",
" 2020-10-06 06:16:12 | \n",
" 10 | \n",
"
\n",
" \n",
" 22 | \n",
" PRJEB40711 | \n",
" 2020-10-13 08:22:40 | \n",
" 39 | \n",
"
\n",
" \n",
" 23 | \n",
" PRJEB40277 | \n",
" 2020-10-16 12:24:41 | \n",
" 1130 | \n",
"
\n",
" \n",
" 24 | \n",
" PRJNA669553 | \n",
" 2020-10-19 12:07:54 | \n",
" 228 | \n",
"
\n",
" \n",
" 25 | \n",
" PRJNA669459 | \n",
" 2020-10-20 05:14:45 | \n",
" 15 | \n",
"
\n",
" \n",
" 26 | \n",
" PRJNA670824 | \n",
" 2020-10-23 03:53:32 | \n",
" 101 | \n",
"
\n",
" \n",
" 27 | \n",
" PRJNA669043 | \n",
" 2020-11-12 08:20:43 | \n",
" 255 | \n",
"
\n",
" \n",
" 28 | \n",
" PRJNA682735 | \n",
" 2020-12-24 17:50:34 | \n",
" 18 | \n",
"
\n",
" \n",
" 29 | \n",
" PRJNA688208 | \n",
" 2020-12-29 13:18:26 | \n",
" 3 | \n",
"
\n",
" \n",
" 30 | \n",
" PRJNA686984 | \n",
" 2020-12-30 16:57:41 | \n",
" 1 | \n",
"
\n",
" \n",
" 31 | \n",
" PRJEB39014 | \n",
" 2021-01-06 11:33:41 | \n",
" 944 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject ReleaseDate N\n",
"0 PRJNA613958 2020-03-24 02:56:38 6\n",
"1 PRJNA614995 2020-03-24 20:18:34 126\n",
"2 PRJNA622817 2020-04-05 04:02:53 5\n",
"3 PRJNA616147 2020-04-14 23:47:18 3\n",
"4 PRJNA627229 2020-04-23 05:01:42 56\n",
"5 PRJNA610248 2020-04-28 22:04:43 162\n",
"6 PRJNA614976 2020-04-29 20:43:51 60\n",
"7 PRJEB37886 2020-04-30 11:59:46 20968\n",
"8 PRJNA632678 2020-05-14 10:09:40 1\n",
"9 PRJEB38388 2020-05-21 12:11:53 584\n",
"10 PRJEB37966 2020-05-27 15:10:51 123\n",
"11 PRJNA634965 2020-06-01 00:08:21 3\n",
"12 PRJNA628662 2020-07-06 09:38:16 72\n",
"13 PRJNA645718 2020-07-13 12:02:15 5\n",
"14 PRJNA645970 2020-07-13 18:54:07 173\n",
"15 PRJEB39487 2020-07-23 18:43:53 339\n",
"16 PRJEB38459 2020-07-24 14:46:31 1\n",
"17 PRJNA649101 2020-07-30 07:49:27 24\n",
"18 PRJNA640656 2020-08-06 10:36:18 88\n",
"19 PRJNA650037 2020-08-13 12:33:55 210\n",
"20 PRJNA658490 2020-08-21 12:57:54 1\n",
"21 PRJNA667434 2020-10-06 06:16:12 10\n",
"22 PRJEB40711 2020-10-13 08:22:40 39\n",
"23 PRJEB40277 2020-10-16 12:24:41 1130\n",
"24 PRJNA669553 2020-10-19 12:07:54 228\n",
"25 PRJNA669459 2020-10-20 05:14:45 15\n",
"26 PRJNA670824 2020-10-23 03:53:32 101\n",
"27 PRJNA669043 2020-11-12 08:20:43 255\n",
"28 PRJNA682735 2020-12-24 17:50:34 18\n",
"29 PRJNA688208 2020-12-29 13:18:26 3\n",
"30 PRJNA686984 2020-12-30 16:57:41 1\n",
"31 PRJEB39014 2021-01-06 11:33:41 944"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZUzqjj6Ek2GR"
},
"source": [
"Number of SRA runs by Library Strategy and Platform shows that Amplicon sequencing using Illumina is most abundant type of data:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "k0oCDclt2SbW",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "0e738e80-132c-4b4f-dcfc-aa97211c9408"
},
"source": [
"print(pysqldf('select LibraryStrategy, Platform, count(*) as N, count(distinct BioProject) as P from ncbi group by Platform, LibraryStrategy order by Platform asc, N desc').to_markdown(index=False))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"| LibraryStrategy | Platform | N | P |\n",
"|:--------------------|:----------------|-------:|----:|\n",
"| AMPLICON | BGISEQ | 21 | 1 |\n",
"| RNA-Seq | BGISEQ | 1 | 1 |\n",
"| WGA | BGISEQ | 1 | 1 |\n",
"| AMPLICON | CAPILLARY | 7 | 1 |\n",
"| AMPLICON | ILLUMINA | 149668 | 61 |\n",
"| WGS | ILLUMINA | 6201 | 43 |\n",
"| RNA-Seq | ILLUMINA | 4434 | 33 |\n",
"| Targeted-Capture | ILLUMINA | 1690 | 11 |\n",
"| WGA | ILLUMINA | 377 | 4 |\n",
"| OTHER | ILLUMINA | 148 | 13 |\n",
"| AMPLICON | ION_TORRENT | 435 | 7 |\n",
"| RNA-Seq | ION_TORRENT | 42 | 4 |\n",
"| WGS | ION_TORRENT | 33 | 6 |\n",
"| AMPLICON | OXFORD_NANOPORE | 25754 | 32 |\n",
"| WGS | OXFORD_NANOPORE | 936 | 12 |\n",
"| WGA | OXFORD_NANOPORE | 580 | 3 |\n",
"| RNA-Seq | OXFORD_NANOPORE | 10 | 5 |\n",
"| OTHER | OXFORD_NANOPORE | 4 | 1 |\n",
"| AMPLICON | PACBIO_SMRT | 12 | 1 |\n",
"| Synthetic-Long-Read | PACBIO_SMRT | 2 | 1 |\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "CfmjR7LcstW5"
},
"source": [
"counts = pysqldf('select LibraryStrategy, Platform, count(*) as N, count(distinct BioProject) as P from ncbi group by Platform, LibraryStrategy order by Platform asc, N desc')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"id": "oJ-PLinsImnQ",
"outputId": "e06f3499-eae8-4cdc-9877-32c2df9dea5e"
},
"source": [
"counts"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" LibraryStrategy | \n",
" Platform | \n",
" N | \n",
" P | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AMPLICON | \n",
" BGISEQ | \n",
" 21 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" RNA-Seq | \n",
" BGISEQ | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" WGA | \n",
" BGISEQ | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" AMPLICON | \n",
" CAPILLARY | \n",
" 7 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" AMPLICON | \n",
" ILLUMINA | \n",
" 149668 | \n",
" 61 | \n",
"
\n",
" \n",
" 5 | \n",
" WGS | \n",
" ILLUMINA | \n",
" 6201 | \n",
" 43 | \n",
"
\n",
" \n",
" 6 | \n",
" RNA-Seq | \n",
" ILLUMINA | \n",
" 4434 | \n",
" 33 | \n",
"
\n",
" \n",
" 7 | \n",
" Targeted-Capture | \n",
" ILLUMINA | \n",
" 1690 | \n",
" 11 | \n",
"
\n",
" \n",
" 8 | \n",
" WGA | \n",
" ILLUMINA | \n",
" 377 | \n",
" 4 | \n",
"
\n",
" \n",
" 9 | \n",
" OTHER | \n",
" ILLUMINA | \n",
" 148 | \n",
" 13 | \n",
"
\n",
" \n",
" 10 | \n",
" AMPLICON | \n",
" ION_TORRENT | \n",
" 435 | \n",
" 7 | \n",
"
\n",
" \n",
" 11 | \n",
" RNA-Seq | \n",
" ION_TORRENT | \n",
" 42 | \n",
" 4 | \n",
"
\n",
" \n",
" 12 | \n",
" WGS | \n",
" ION_TORRENT | \n",
" 33 | \n",
" 6 | \n",
"
\n",
" \n",
" 13 | \n",
" AMPLICON | \n",
" OXFORD_NANOPORE | \n",
" 25754 | \n",
" 32 | \n",
"
\n",
" \n",
" 14 | \n",
" WGS | \n",
" OXFORD_NANOPORE | \n",
" 936 | \n",
" 12 | \n",
"
\n",
" \n",
" 15 | \n",
" WGA | \n",
" OXFORD_NANOPORE | \n",
" 580 | \n",
" 3 | \n",
"
\n",
" \n",
" 16 | \n",
" RNA-Seq | \n",
" OXFORD_NANOPORE | \n",
" 10 | \n",
" 5 | \n",
"
\n",
" \n",
" 17 | \n",
" OTHER | \n",
" OXFORD_NANOPORE | \n",
" 4 | \n",
" 1 | \n",
"
\n",
" \n",
" 18 | \n",
" AMPLICON | \n",
" PACBIO_SMRT | \n",
" 12 | \n",
" 1 | \n",
"
\n",
" \n",
" 19 | \n",
" Synthetic-Long-Read | \n",
" PACBIO_SMRT | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" LibraryStrategy Platform N P\n",
"0 AMPLICON BGISEQ 21 1\n",
"1 RNA-Seq BGISEQ 1 1\n",
"2 WGA BGISEQ 1 1\n",
"3 AMPLICON CAPILLARY 7 1\n",
"4 AMPLICON ILLUMINA 149668 61\n",
"5 WGS ILLUMINA 6201 43\n",
"6 RNA-Seq ILLUMINA 4434 33\n",
"7 Targeted-Capture ILLUMINA 1690 11\n",
"8 WGA ILLUMINA 377 4\n",
"9 OTHER ILLUMINA 148 13\n",
"10 AMPLICON ION_TORRENT 435 7\n",
"11 RNA-Seq ION_TORRENT 42 4\n",
"12 WGS ION_TORRENT 33 6\n",
"13 AMPLICON OXFORD_NANOPORE 25754 32\n",
"14 WGS OXFORD_NANOPORE 936 12\n",
"15 WGA OXFORD_NANOPORE 580 3\n",
"16 RNA-Seq OXFORD_NANOPORE 10 5\n",
"17 OTHER OXFORD_NANOPORE 4 1\n",
"18 AMPLICON PACBIO_SMRT 12 1\n",
"19 Synthetic-Long-Read PACBIO_SMRT 2 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 534
},
"id": "W4f3U376m1US",
"outputId": "4ff3b2ef-8d64-40f6-a074-a383552dd92b"
},
"source": [
"import pandas as pd\n",
"from math import pi\n",
"import bokeh.io\n",
"from bokeh.models import (BasicTicker, ColorBar, ColumnDataSource,\n",
" LogColorMapper, PrintfTickFormatter,LinearColorMapper,ContinuousColorMapper,LogTicker)\n",
"from bokeh.plotting import figure\n",
"from bokeh.transform import transform\n",
"from bokeh.palettes import viridis\n",
"\n",
"bokeh.io.output_notebook()\n",
"\n",
"source = ColumnDataSource(counts)\n",
"colors = list(reversed(viridis(64)))\n",
"\n",
"mapper = LogColorMapper(palette=colors, low=counts['N'].min(), high=counts['N'].max())\n",
"\n",
"TOOLTIPS = [\n",
" (\"SRA accessions\",\"@N\"),\n",
" (\"BioProjects\",\"@P\")\n",
"]\n",
"\n",
"p = figure(\n",
" plot_width=600, \n",
" plot_height=500, \n",
" x_range=counts['LibraryStrategy'].unique(), \n",
" y_range=counts['Platform'].unique(),\n",
" x_axis_location=\"above\",\n",
" tooltips=TOOLTIPS,\n",
" tools='save',\n",
" )\n",
"p.rect(\n",
" x=\"LibraryStrategy\", \n",
" y=\"Platform\", \n",
" width=1, \n",
" height=1, \n",
" source=source,\n",
" line_color=None, \n",
" fill_color=transform('N', mapper)\n",
" )\n",
"color_bar = ColorBar(\n",
" color_mapper=mapper, \n",
" location=(0, 0),\n",
" ticker=LogTicker(),\n",
" label_standoff=12,\n",
" formatter=PrintfTickFormatter(format=\"%d\")\n",
" )\n",
"p.add_layout(color_bar, 'right')\n",
"p.xaxis.major_label_orientation = pi/2\n",
"p.axis.axis_line_color = None\n",
"p.axis.major_tick_line_color = None\n",
"p.ygrid.grid_line_color = None\n",
"p.xgrid.grid_line_color = None\n",
"\n",
"try:\n",
" bokeh.io.reset_output()\n",
" bokeh.io.output_notebook()\n",
" bokeh.io.show(p)\n",
"except:\n",
" bokeh.io.output_notebook()\n",
" bokeh.io.show(p)\n",
"\n",
"r = dp.Report(\n",
" dp.Plot(p)\n",
")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"\u001b[32mConnected successfully to https://datapane.com as nekrut\u001b[0m\n"
],
"name": "stdout"
},
{
"output_type": "display_data",
"data": {
"application/javascript": [
"\n",
"(function(root) {\n",
" function now() {\n",
" return new Date();\n",
" }\n",
"\n",
" var force = true;\n",
"\n",
" if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n",
" root._bokeh_onload_callbacks = [];\n",
" root._bokeh_is_loading = undefined;\n",
" }\n",
"\n",
" var JS_MIME_TYPE = 'application/javascript';\n",
" var HTML_MIME_TYPE = 'text/html';\n",
" var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n",
" var CLASS_NAME = 'output_bokeh rendered_html';\n",
"\n",
" /**\n",
" * Render data to the DOM node\n",
" */\n",
" function render(props, node) {\n",
" var script = document.createElement(\"script\");\n",
" node.appendChild(script);\n",
" }\n",
"\n",
" /**\n",
" * Handle when an output is cleared or removed\n",
" */\n",
" function handleClearOutput(event, handle) {\n",
" var cell = handle.cell;\n",
"\n",
" var id = cell.output_area._bokeh_element_id;\n",
" var server_id = cell.output_area._bokeh_server_id;\n",
" // Clean up Bokeh references\n",
" if (id != null && id in Bokeh.index) {\n",
" Bokeh.index[id].model.document.clear();\n",
" delete Bokeh.index[id];\n",
" }\n",
"\n",
" if (server_id !== undefined) {\n",
" // Clean up Bokeh references\n",
" var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n",
" cell.notebook.kernel.execute(cmd, {\n",
" iopub: {\n",
" output: function(msg) {\n",
" var id = msg.content.text.trim();\n",
" if (id in Bokeh.index) {\n",
" Bokeh.index[id].model.document.clear();\n",
" delete Bokeh.index[id];\n",
" }\n",
" }\n",
" }\n",
" });\n",
" // Destroy server and session\n",
" var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n",
" cell.notebook.kernel.execute(cmd);\n",
" }\n",
" }\n",
"\n",
" /**\n",
" * Handle when a new output is added\n",
" */\n",
" function handleAddOutput(event, handle) {\n",
" var output_area = handle.output_area;\n",
" var output = handle.output;\n",
"\n",
" // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n",
" if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n",
" return\n",
" }\n",
"\n",
" var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n",
"\n",
" if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n",
" toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n",
" // store reference to embed id on output_area\n",
" output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n",
" }\n",
" if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n",
" var bk_div = document.createElement(\"div\");\n",
" bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n",
" var script_attrs = bk_div.children[0].attributes;\n",
" for (var i = 0; i < script_attrs.length; i++) {\n",
" toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n",
" toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n",
" }\n",
" // store reference to server id on output_area\n",
" output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n",
" }\n",
" }\n",
"\n",
" function register_renderer(events, OutputArea) {\n",
"\n",
" function append_mime(data, metadata, element) {\n",
" // create a DOM node to render to\n",
" var toinsert = this.create_output_subarea(\n",
" metadata,\n",
" CLASS_NAME,\n",
" EXEC_MIME_TYPE\n",
" );\n",
" this.keyboard_manager.register_events(toinsert);\n",
" // Render to node\n",
" var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n",
" render(props, toinsert[toinsert.length - 1]);\n",
" element.append(toinsert);\n",
" return toinsert\n",
" }\n",
"\n",
" /* Handle when an output is cleared or removed */\n",
" events.on('clear_output.CodeCell', handleClearOutput);\n",
" events.on('delete.Cell', handleClearOutput);\n",
"\n",
" /* Handle when a new output is added */\n",
" events.on('output_added.OutputArea', handleAddOutput);\n",
"\n",
" /**\n",
" * Register the mime type and append_mime function with output_area\n",
" */\n",
" OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n",
" /* Is output safe? */\n",
" safe: true,\n",
" /* Index of renderer in `output_area.display_order` */\n",
" index: 0\n",
" });\n",
" }\n",
"\n",
" // register the mime type if in Jupyter Notebook environment and previously unregistered\n",
" if (root.Jupyter !== undefined) {\n",
" var events = require('base/js/events');\n",
" var OutputArea = require('notebook/js/outputarea').OutputArea;\n",
"\n",
" if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n",
" register_renderer(events, OutputArea);\n",
" }\n",
" }\n",
"\n",
" \n",
" if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n",
" root._bokeh_timeout = Date.now() + 5000;\n",
" root._bokeh_failed_load = false;\n",
" }\n",
"\n",
" var NB_LOAD_WARNING = {'data': {'text/html':\n",
" \"\\n\"+\n",
" \"
\\n\"+\n",
" \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n",
" \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n",
" \"- use INLINE resources instead, as so:
\\n\"+\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"
\"}};\n",
"\n",
" function display_loaded() {\n",
" var el = document.getElementById(null);\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS is loading...\";\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n",
" }\n",
" } else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(display_loaded, 100)\n",
" }\n",
" }\n",
"\n",
"\n",
" function run_callbacks() {\n",
" try {\n",
" root._bokeh_onload_callbacks.forEach(function(callback) {\n",
" if (callback != null)\n",
" callback();\n",
" });\n",
" } finally {\n",
" delete root._bokeh_onload_callbacks\n",
" }\n",
" console.debug(\"Bokeh: all callbacks have finished\");\n",
" }\n",
"\n",
" function load_libs(css_urls, js_urls, callback) {\n",
" if (css_urls == null) css_urls = [];\n",
" if (js_urls == null) js_urls = [];\n",
"\n",
" root._bokeh_onload_callbacks.push(callback);\n",
" if (root._bokeh_is_loading > 0) {\n",
" console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
" return null;\n",
" }\n",
" if (js_urls == null || js_urls.length === 0) {\n",
" run_callbacks();\n",
" return null;\n",
" }\n",
" console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
" root._bokeh_is_loading = css_urls.length + js_urls.length;\n",
"\n",
" function on_load() {\n",
" root._bokeh_is_loading--;\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n",
" run_callbacks()\n",
" }\n",
" }\n",
"\n",
" function on_error() {\n",
" console.error(\"failed to load \" + url);\n",
" }\n",
"\n",
" for (var i = 0; i < css_urls.length; i++) {\n",
" var url = css_urls[i];\n",
" const element = document.createElement(\"link\");\n",
" element.onload = on_load;\n",
" element.onerror = on_error;\n",
" element.rel = \"stylesheet\";\n",
" element.type = \"text/css\";\n",
" element.href = url;\n",
" console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n",
" document.body.appendChild(element);\n",
" }\n",
"\n",
" for (var i = 0; i < js_urls.length; i++) {\n",
" var url = js_urls[i];\n",
" var element = document.createElement('script');\n",
" element.onload = on_load;\n",
" element.onerror = on_error;\n",
" element.async = false;\n",
" element.src = url;\n",
" \n",
" console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
" document.head.appendChild(element);\n",
" }\n",
" };\n",
"\n",
" function inject_raw_css(css) {\n",
" const element = document.createElement(\"style\");\n",
" element.appendChild(document.createTextNode(css));\n",
" document.body.appendChild(element);\n",
" }\n",
"\n",
" \n",
" var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n",
" var css_urls = [];\n",
" \n",
"\n",
" var inline_js = [\n",
" function(Bokeh) {\n",
" Bokeh.set_log_level(\"info\");\n",
" },\n",
" function(Bokeh) {\n",
" \n",
" \n",
" }\n",
" ];\n",
"\n",
" function run_inline_js() {\n",
" \n",
" if (root.Bokeh !== undefined || force === true) {\n",
" \n",
" for (var i = 0; i < inline_js.length; i++) {\n",
" inline_js[i].call(root, root.Bokeh);\n",
" }\n",
" } else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(run_inline_js, 100);\n",
" } else if (!root._bokeh_failed_load) {\n",
" console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n",
" root._bokeh_failed_load = true;\n",
" } else if (force !== true) {\n",
" var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n",
" cell.output_area.append_execute_result(NB_LOAD_WARNING)\n",
" }\n",
"\n",
" }\n",
"\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
" run_inline_js();\n",
" } else {\n",
" load_libs(css_urls, js_urls, function() {\n",
" console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n",
" run_inline_js();\n",
" });\n",
" }\n",
"}(window));"
],
"application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n \"- use INLINE resources instead, as so:
\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n \n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));"
},
"metadata": {
"tags": []
}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" \n"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "display_data",
"data": {
"application/javascript": [
"(function(root) {\n",
" function embed_document(root) {\n",
" \n",
" var docs_json = {\"c07fda4c-7d75-4f8b-bcef-d0eb9b845f75\":{\"roots\":{\"references\":[{\"attributes\":{\"above\":[{\"id\":\"3706\"}],\"center\":[{\"id\":\"3708\"},{\"id\":\"3711\"}],\"left\":[{\"id\":\"3709\"}],\"plot_height\":500,\"renderers\":[{\"id\":\"3720\"}],\"right\":[{\"id\":\"3724\"}],\"title\":{\"id\":\"3728\"},\"toolbar\":{\"id\":\"3714\"},\"x_range\":{\"id\":\"3698\"},\"x_scale\":{\"id\":\"3702\"},\"y_range\":{\"id\":\"3700\"},\"y_scale\":{\"id\":\"3704\"}},\"id\":\"3697\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"data_source\":{\"id\":\"3695\"},\"glyph\":{\"id\":\"3718\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"3719\"},\"selection_glyph\":null,\"view\":{\"id\":\"3721\"}},\"id\":\"3720\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"factors\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"WGS\",\"Targeted-Capture\",\"OTHER\",\"Synthetic-Long-Read\"]},\"id\":\"3698\",\"type\":\"FactorRange\"},{\"attributes\":{\"callback\":null,\"tooltips\":[[\"SRA accessions\",\"@N\"],[\"BioProjects\",\"@P\"]]},\"id\":\"3713\",\"type\":\"HoverTool\"},{\"attributes\":{\"axis\":{\"id\":\"3706\"},\"grid_line_color\":null,\"ticker\":null},\"id\":\"3708\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"3722\",\"type\":\"LogTicker\"},{\"attributes\":{\"axis\":{\"id\":\"3709\"},\"dimension\":1,\"grid_line_color\":null,\"ticker\":null},\"id\":\"3711\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"3707\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"format\":\"%d\"},\"id\":\"3723\",\"type\":\"PrintfTickFormatter\"},{\"attributes\":{},\"id\":\"3702\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3730\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{},\"id\":\"3710\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3730\"},\"major_label_orientation\":1.5707963267948966,\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3707\"}},\"id\":\"3706\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"high\":149668,\"low\":1,\"palette\":[\"#FDE724\",\"#F1E51C\",\"#E7E419\",\"#DCE218\",\"#D2E11B\",\"#C7E01F\",\"#BDDE26\",\"#B2DD2C\",\"#A7DB33\",\"#9DD93A\",\"#92D741\",\"#88D547\",\"#7ED24E\",\"#74D054\",\"#6BCD59\",\"#62CA5F\",\"#59C764\",\"#51C468\",\"#49C16D\",\"#42BE71\",\"#3BBA75\",\"#35B778\",\"#2EB27C\",\"#29AF7F\",\"#25AB81\",\"#22A784\",\"#20A485\",\"#1EA087\",\"#1E9C89\",\"#1E998A\",\"#1F958B\",\"#20918C\",\"#218D8C\",\"#22898D\",\"#24868D\",\"#25828E\",\"#277E8E\",\"#287A8E\",\"#2A778E\",\"#2B738E\",\"#2D6F8E\",\"#2E6B8E\",\"#30678D\",\"#32628D\",\"#345E8D\",\"#365A8C\",\"#38568B\",\"#3A528B\",\"#3C4D8A\",\"#3E4989\",\"#404487\",\"#424085\",\"#433B83\",\"#453681\",\"#46317E\",\"#472C7B\",\"#472777\",\"#482273\",\"#481D6F\",\"#47186A\",\"#471265\",\"#460C5F\",\"#45065A\",\"#440154\"]},\"id\":\"3696\",\"type\":\"LogColorMapper\"},{\"attributes\":{},\"id\":\"3734\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3732\"},\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3710\"}},\"id\":\"3709\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"3712\"},{\"id\":\"3713\"}]},\"id\":\"3714\",\"type\":\"Toolbar\"},{\"attributes\":{\"color_mapper\":{\"id\":\"3696\"},\"formatter\":{\"id\":\"3723\"},\"label_standoff\":12,\"location\":[0,0],\"ticker\":{\"id\":\"3722\"}},\"id\":\"3724\",\"type\":\"ColorBar\"},{\"attributes\":{},\"id\":\"3712\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"3704\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3733\",\"type\":\"Selection\"},{\"attributes\":{\"factors\":[\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\"]},\"id\":\"3700\",\"type\":\"FactorRange\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3696\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3719\",\"type\":\"Rect\"},{\"attributes\":{\"source\":{\"id\":\"3695\"}},\"id\":\"3721\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"3732\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3696\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3718\",\"type\":\"Rect\"},{\"attributes\":{\"data\":{\"LibraryStrategy\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"AMPLICON\",\"AMPLICON\",\"WGS\",\"RNA-Seq\",\"Targeted-Capture\",\"WGA\",\"OTHER\",\"AMPLICON\",\"RNA-Seq\",\"WGS\",\"AMPLICON\",\"WGS\",\"WGA\",\"RNA-Seq\",\"OTHER\",\"AMPLICON\",\"Synthetic-Long-Read\"],\"N\":[21,1,1,7,149668,6201,4434,1690,377,148,435,42,33,25754,936,580,10,4,12,2],\"P\":[1,1,1,1,61,43,33,11,4,13,7,4,6,32,12,3,5,1,1,1],\"Platform\":[\"BGISEQ\",\"BGISEQ\",\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ION_TORRENT\",\"ION_TORRENT\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\",\"PACBIO_SMRT\"],\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]},\"selected\":{\"id\":\"3733\"},\"selection_policy\":{\"id\":\"3734\"}},\"id\":\"3695\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"text\":\"\"},\"id\":\"3728\",\"type\":\"Title\"}],\"root_ids\":[\"3697\"]},\"title\":\"Bokeh Application\",\"version\":\"2.0.0\"}};\n",
" var render_items = [{\"docid\":\"c07fda4c-7d75-4f8b-bcef-d0eb9b845f75\",\"root_ids\":[\"3697\"],\"roots\":{\"3697\":\"96659d31-0192-4681-96d3-cb1de4e276ff\"}}];\n",
" root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n",
"\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" embed_document(root);\n",
" } else {\n",
" var attempts = 0;\n",
" var timer = setInterval(function(root) {\n",
" if (root.Bokeh !== undefined) {\n",
" clearInterval(timer);\n",
" embed_document(root);\n",
" } else {\n",
" attempts++;\n",
" if (attempts > 100) {\n",
" clearInterval(timer);\n",
" console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n",
" }\n",
" }\n",
" }, 10, root)\n",
" }\n",
"})(window);"
],
"application/vnd.bokehjs_exec.v0+json": ""
},
"metadata": {
"tags": [],
"application/vnd.bokehjs_exec.v0+json": {
"id": "3697"
}
}
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 317
},
"id": "ZxCgvj7KPP5v",
"outputId": "49b1c64c-e148-4f61-fcdc-7bf6ffd31360"
},
"source": [
""
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"application/javascript": [
"\n",
"(function(root) {\n",
" function now() {\n",
" return new Date();\n",
" }\n",
"\n",
" var force = true;\n",
"\n",
" if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n",
" root._bokeh_onload_callbacks = [];\n",
" root._bokeh_is_loading = undefined;\n",
" }\n",
"\n",
" var JS_MIME_TYPE = 'application/javascript';\n",
" var HTML_MIME_TYPE = 'text/html';\n",
" var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n",
" var CLASS_NAME = 'output_bokeh rendered_html';\n",
"\n",
" /**\n",
" * Render data to the DOM node\n",
" */\n",
" function render(props, node) {\n",
" var script = document.createElement(\"script\");\n",
" node.appendChild(script);\n",
" }\n",
"\n",
" /**\n",
" * Handle when an output is cleared or removed\n",
" */\n",
" function handleClearOutput(event, handle) {\n",
" var cell = handle.cell;\n",
"\n",
" var id = cell.output_area._bokeh_element_id;\n",
" var server_id = cell.output_area._bokeh_server_id;\n",
" // Clean up Bokeh references\n",
" if (id != null && id in Bokeh.index) {\n",
" Bokeh.index[id].model.document.clear();\n",
" delete Bokeh.index[id];\n",
" }\n",
"\n",
" if (server_id !== undefined) {\n",
" // Clean up Bokeh references\n",
" var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n",
" cell.notebook.kernel.execute(cmd, {\n",
" iopub: {\n",
" output: function(msg) {\n",
" var id = msg.content.text.trim();\n",
" if (id in Bokeh.index) {\n",
" Bokeh.index[id].model.document.clear();\n",
" delete Bokeh.index[id];\n",
" }\n",
" }\n",
" }\n",
" });\n",
" // Destroy server and session\n",
" var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n",
" cell.notebook.kernel.execute(cmd);\n",
" }\n",
" }\n",
"\n",
" /**\n",
" * Handle when a new output is added\n",
" */\n",
" function handleAddOutput(event, handle) {\n",
" var output_area = handle.output_area;\n",
" var output = handle.output;\n",
"\n",
" // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n",
" if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n",
" return\n",
" }\n",
"\n",
" var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n",
"\n",
" if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n",
" toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n",
" // store reference to embed id on output_area\n",
" output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n",
" }\n",
" if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n",
" var bk_div = document.createElement(\"div\");\n",
" bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n",
" var script_attrs = bk_div.children[0].attributes;\n",
" for (var i = 0; i < script_attrs.length; i++) {\n",
" toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n",
" toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n",
" }\n",
" // store reference to server id on output_area\n",
" output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n",
" }\n",
" }\n",
"\n",
" function register_renderer(events, OutputArea) {\n",
"\n",
" function append_mime(data, metadata, element) {\n",
" // create a DOM node to render to\n",
" var toinsert = this.create_output_subarea(\n",
" metadata,\n",
" CLASS_NAME,\n",
" EXEC_MIME_TYPE\n",
" );\n",
" this.keyboard_manager.register_events(toinsert);\n",
" // Render to node\n",
" var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n",
" render(props, toinsert[toinsert.length - 1]);\n",
" element.append(toinsert);\n",
" return toinsert\n",
" }\n",
"\n",
" /* Handle when an output is cleared or removed */\n",
" events.on('clear_output.CodeCell', handleClearOutput);\n",
" events.on('delete.Cell', handleClearOutput);\n",
"\n",
" /* Handle when a new output is added */\n",
" events.on('output_added.OutputArea', handleAddOutput);\n",
"\n",
" /**\n",
" * Register the mime type and append_mime function with output_area\n",
" */\n",
" OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n",
" /* Is output safe? */\n",
" safe: true,\n",
" /* Index of renderer in `output_area.display_order` */\n",
" index: 0\n",
" });\n",
" }\n",
"\n",
" // register the mime type if in Jupyter Notebook environment and previously unregistered\n",
" if (root.Jupyter !== undefined) {\n",
" var events = require('base/js/events');\n",
" var OutputArea = require('notebook/js/outputarea').OutputArea;\n",
"\n",
" if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n",
" register_renderer(events, OutputArea);\n",
" }\n",
" }\n",
"\n",
" \n",
" if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n",
" root._bokeh_timeout = Date.now() + 5000;\n",
" root._bokeh_failed_load = false;\n",
" }\n",
"\n",
" var NB_LOAD_WARNING = {'data': {'text/html':\n",
" \"\\n\"+\n",
" \"
\\n\"+\n",
" \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n",
" \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n",
" \"- use INLINE resources instead, as so:
\\n\"+\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"
\"}};\n",
"\n",
" function display_loaded() {\n",
" var el = document.getElementById(null);\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS is loading...\";\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n",
" }\n",
" } else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(display_loaded, 100)\n",
" }\n",
" }\n",
"\n",
"\n",
" function run_callbacks() {\n",
" try {\n",
" root._bokeh_onload_callbacks.forEach(function(callback) {\n",
" if (callback != null)\n",
" callback();\n",
" });\n",
" } finally {\n",
" delete root._bokeh_onload_callbacks\n",
" }\n",
" console.debug(\"Bokeh: all callbacks have finished\");\n",
" }\n",
"\n",
" function load_libs(css_urls, js_urls, callback) {\n",
" if (css_urls == null) css_urls = [];\n",
" if (js_urls == null) js_urls = [];\n",
"\n",
" root._bokeh_onload_callbacks.push(callback);\n",
" if (root._bokeh_is_loading > 0) {\n",
" console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
" return null;\n",
" }\n",
" if (js_urls == null || js_urls.length === 0) {\n",
" run_callbacks();\n",
" return null;\n",
" }\n",
" console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
" root._bokeh_is_loading = css_urls.length + js_urls.length;\n",
"\n",
" function on_load() {\n",
" root._bokeh_is_loading--;\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n",
" run_callbacks()\n",
" }\n",
" }\n",
"\n",
" function on_error() {\n",
" console.error(\"failed to load \" + url);\n",
" }\n",
"\n",
" for (var i = 0; i < css_urls.length; i++) {\n",
" var url = css_urls[i];\n",
" const element = document.createElement(\"link\");\n",
" element.onload = on_load;\n",
" element.onerror = on_error;\n",
" element.rel = \"stylesheet\";\n",
" element.type = \"text/css\";\n",
" element.href = url;\n",
" console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n",
" document.body.appendChild(element);\n",
" }\n",
"\n",
" for (var i = 0; i < js_urls.length; i++) {\n",
" var url = js_urls[i];\n",
" var element = document.createElement('script');\n",
" element.onload = on_load;\n",
" element.onerror = on_error;\n",
" element.async = false;\n",
" element.src = url;\n",
" \n",
" console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
" document.head.appendChild(element);\n",
" }\n",
" };\n",
"\n",
" function inject_raw_css(css) {\n",
" const element = document.createElement(\"style\");\n",
" element.appendChild(document.createTextNode(css));\n",
" document.body.appendChild(element);\n",
" }\n",
"\n",
" \n",
" var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n",
" var css_urls = [];\n",
" \n",
"\n",
" var inline_js = [\n",
" function(Bokeh) {\n",
" Bokeh.set_log_level(\"info\");\n",
" },\n",
" function(Bokeh) {\n",
" \n",
" \n",
" }\n",
" ];\n",
"\n",
" function run_inline_js() {\n",
" \n",
" if (root.Bokeh !== undefined || force === true) {\n",
" \n",
" for (var i = 0; i < inline_js.length; i++) {\n",
" inline_js[i].call(root, root.Bokeh);\n",
" }\n",
" } else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(run_inline_js, 100);\n",
" } else if (!root._bokeh_failed_load) {\n",
" console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n",
" root._bokeh_failed_load = true;\n",
" } else if (force !== true) {\n",
" var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n",
" cell.output_area.append_execute_result(NB_LOAD_WARNING)\n",
" }\n",
"\n",
" }\n",
"\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
" run_inline_js();\n",
" } else {\n",
" load_libs(css_urls, js_urls, function() {\n",
" console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n",
" run_inline_js();\n",
" });\n",
" }\n",
"}(window));"
],
"application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n \"- use INLINE resources instead, as so:
\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n \n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));"
},
"metadata": {
"tags": []
}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" \n"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "display_data",
"data": {
"application/javascript": [
"(function(root) {\n",
" function embed_document(root) {\n",
" \n",
" var docs_json = {\"c58613ee-1a5a-4479-99e2-441891cef0d6\":{\"roots\":{\"references\":[{\"attributes\":{\"above\":[{\"id\":\"3218\"}],\"center\":[{\"id\":\"3220\"},{\"id\":\"3223\"}],\"left\":[{\"id\":\"3221\"}],\"plot_height\":300,\"plot_width\":400,\"renderers\":[{\"id\":\"3232\"}],\"right\":[{\"id\":\"3236\"}],\"title\":{\"id\":\"3240\"},\"toolbar\":{\"id\":\"3226\"},\"toolbar_location\":null,\"x_range\":{\"id\":\"3210\"},\"x_scale\":{\"id\":\"3214\"},\"y_range\":{\"id\":\"3212\"},\"y_scale\":{\"id\":\"3216\"}},\"id\":\"3209\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"axis\":{\"id\":\"3221\"},\"dimension\":1,\"grid_line_color\":null,\"ticker\":null},\"id\":\"3223\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3208\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3231\",\"type\":\"Rect\"},{\"attributes\":{\"data_source\":{\"id\":\"3207\"},\"glyph\":{\"id\":\"3230\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"3231\"},\"selection_glyph\":null,\"view\":{\"id\":\"3233\"}},\"id\":\"3232\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"high\":149668,\"low\":1,\"palette\":[\"#FDE724\",\"#F1E51C\",\"#E7E419\",\"#DCE218\",\"#D2E11B\",\"#C7E01F\",\"#BDDE26\",\"#B2DD2C\",\"#A7DB33\",\"#9DD93A\",\"#92D741\",\"#88D547\",\"#7ED24E\",\"#74D054\",\"#6BCD59\",\"#62CA5F\",\"#59C764\",\"#51C468\",\"#49C16D\",\"#42BE71\",\"#3BBA75\",\"#35B778\",\"#2EB27C\",\"#29AF7F\",\"#25AB81\",\"#22A784\",\"#20A485\",\"#1EA087\",\"#1E9C89\",\"#1E998A\",\"#1F958B\",\"#20918C\",\"#218D8C\",\"#22898D\",\"#24868D\",\"#25828E\",\"#277E8E\",\"#287A8E\",\"#2A778E\",\"#2B738E\",\"#2D6F8E\",\"#2E6B8E\",\"#30678D\",\"#32628D\",\"#345E8D\",\"#365A8C\",\"#38568B\",\"#3A528B\",\"#3C4D8A\",\"#3E4989\",\"#404487\",\"#424085\",\"#433B83\",\"#453681\",\"#46317E\",\"#472C7B\",\"#472777\",\"#482273\",\"#481D6F\",\"#47186A\",\"#471265\",\"#460C5F\",\"#45065A\",\"#440154\"]},\"id\":\"3208\",\"type\":\"LogColorMapper\"},{\"attributes\":{\"source\":{\"id\":\"3207\"}},\"id\":\"3233\",\"type\":\"CDSView\"},{\"attributes\":{\"factors\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"WGS\",\"Targeted-Capture\",\"OTHER\",\"Synthetic-Long-Read\"]},\"id\":\"3210\",\"type\":\"FactorRange\"},{\"attributes\":{\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3208\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3230\",\"type\":\"Rect\"},{\"attributes\":{},\"id\":\"3234\",\"type\":\"LogTicker\"},{\"attributes\":{},\"id\":\"3219\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"data\":{\"LibraryStrategy\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"AMPLICON\",\"AMPLICON\",\"WGS\",\"RNA-Seq\",\"Targeted-Capture\",\"WGA\",\"OTHER\",\"AMPLICON\",\"RNA-Seq\",\"WGS\",\"AMPLICON\",\"WGS\",\"WGA\",\"RNA-Seq\",\"OTHER\",\"AMPLICON\",\"Synthetic-Long-Read\"],\"N\":[21,1,1,7,149668,6201,4434,1690,377,148,435,42,33,25754,936,580,10,4,12,2],\"P\":[1,1,1,1,61,43,33,11,4,13,7,4,6,32,12,3,5,1,1,1],\"Platform\":[\"BGISEQ\",\"BGISEQ\",\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ION_TORRENT\",\"ION_TORRENT\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\",\"PACBIO_SMRT\"],\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]},\"selected\":{\"id\":\"3245\"},\"selection_policy\":{\"id\":\"3246\"}},\"id\":\"3207\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"format\":\"%d\"},\"id\":\"3235\",\"type\":\"PrintfTickFormatter\"},{\"attributes\":{\"color_mapper\":{\"id\":\"3208\"},\"formatter\":{\"id\":\"3235\"},\"label_standoff\":12,\"location\":[0,0],\"ticker\":{\"id\":\"3234\"}},\"id\":\"3236\",\"type\":\"ColorBar\"},{\"attributes\":{},\"id\":\"3224\",\"type\":\"SaveTool\"},{\"attributes\":{\"factors\":[\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\"]},\"id\":\"3212\",\"type\":\"FactorRange\"},{\"attributes\":{},\"id\":\"3222\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3244\"},\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3222\"}},\"id\":\"3221\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3242\"},\"major_label_orientation\":1.5707963267948966,\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3219\"}},\"id\":\"3218\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"axis\":{\"id\":\"3218\"},\"grid_line_color\":null,\"ticker\":null},\"id\":\"3220\",\"type\":\"Grid\"},{\"attributes\":{\"text\":\"\"},\"id\":\"3240\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"3245\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"3242\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{},\"id\":\"3216\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3244\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"3224\"},{\"id\":\"3225\"}]},\"id\":\"3226\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"3214\",\"type\":\"CategoricalScale\"},{\"attributes\":{\"callback\":null,\"tooltips\":[[\"SRA accessions\",\"@N\"],[\"BioProjects\",\"@P\"]]},\"id\":\"3225\",\"type\":\"HoverTool\"},{\"attributes\":{},\"id\":\"3246\",\"type\":\"UnionRenderers\"}],\"root_ids\":[\"3209\"]},\"title\":\"Bokeh Application\",\"version\":\"2.0.0\"}};\n",
" var render_items = [{\"docid\":\"c58613ee-1a5a-4479-99e2-441891cef0d6\",\"root_ids\":[\"3209\"],\"roots\":{\"3209\":\"3e142481-48c2-4c2a-809e-8822fe0f9dc9\"}}];\n",
" root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n",
"\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" embed_document(root);\n",
" } else {\n",
" var attempts = 0;\n",
" var timer = setInterval(function(root) {\n",
" if (root.Bokeh !== undefined) {\n",
" clearInterval(timer);\n",
" embed_document(root);\n",
" } else {\n",
" attempts++;\n",
" if (attempts > 100) {\n",
" clearInterval(timer);\n",
" console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n",
" }\n",
" }\n",
" }, 10, root)\n",
" }\n",
"})(window);"
],
"application/vnd.bokehjs_exec.v0+json": ""
},
"metadata": {
"tags": [],
"application/vnd.bokehjs_exec.v0+json": {
"id": "3209"
}
}
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RCjkpNswK-2O",
"outputId": "3de3d21a-e55d-4f56-e550-9f6401284cbb"
},
"source": [
"# Deploy to datapane\n",
"r.publish(name='SRA stats by Platform and Library Type', open=True)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Publishing report and associated data - please wait..\n",
"Report successfully published at https://datapane.com/u/nekrut/reports/sra-stats-by-platform-and-library-type/\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "GADMAmNks6Ei",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"outputId": "c3175c1b-5302-45ec-acc6-d78b43e56d9d"
},
"source": [
"counts.pivot(index='LibraryStrategy',columns='Platform',values='N')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Platform | \n",
" BGISEQ | \n",
" CAPILLARY | \n",
" ILLUMINA | \n",
" ION_TORRENT | \n",
" OXFORD_NANOPORE | \n",
" PACBIO_SMRT | \n",
"
\n",
" \n",
" LibraryStrategy | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AMPLICON | \n",
" 21.0 | \n",
" 7.0 | \n",
" 149668.0 | \n",
" 435.0 | \n",
" 25754.0 | \n",
" 12.0 | \n",
"
\n",
" \n",
" OTHER | \n",
" NaN | \n",
" NaN | \n",
" 148.0 | \n",
" NaN | \n",
" 4.0 | \n",
" NaN | \n",
"
\n",
" \n",
" RNA-Seq | \n",
" 1.0 | \n",
" NaN | \n",
" 4434.0 | \n",
" 42.0 | \n",
" 10.0 | \n",
" NaN | \n",
"
\n",
" \n",
" Synthetic-Long-Read | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 2.0 | \n",
"
\n",
" \n",
" Targeted-Capture | \n",
" NaN | \n",
" NaN | \n",
" 1690.0 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" WGA | \n",
" 1.0 | \n",
" NaN | \n",
" 377.0 | \n",
" NaN | \n",
" 580.0 | \n",
" NaN | \n",
"
\n",
" \n",
" WGS | \n",
" NaN | \n",
" NaN | \n",
" 6201.0 | \n",
" 33.0 | \n",
" 936.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Platform BGISEQ CAPILLARY ILLUMINA ION_TORRENT OXFORD_NANOPORE PACBIO_SMRT\n",
"LibraryStrategy \n",
"AMPLICON 21.0 7.0 149668.0 435.0 25754.0 12.0\n",
"OTHER NaN NaN 148.0 NaN 4.0 NaN\n",
"RNA-Seq 1.0 NaN 4434.0 42.0 10.0 NaN\n",
"Synthetic-Long-Read NaN NaN NaN NaN NaN 2.0\n",
"Targeted-Capture NaN NaN 1690.0 NaN NaN NaN\n",
"WGA 1.0 NaN 377.0 NaN 580.0 NaN\n",
"WGS NaN NaN 6201.0 33.0 936.0 NaN"
]
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UDkJYKd8nu4z"
},
"source": [
"Individuals SRA runs are organized into SRAStudies or BioProjects:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bNvCsmIllQrG",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "fdcb6491-44ba-4f35-e0fc-a9159b7f8c81"
},
"source": [
"pysqldf('select SRAStudy, count(*) as N from ncbi group by SRAStudy order by N desc').head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" SRAStudy | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ERP121228 | \n",
" 69982 | \n",
"
\n",
" \n",
" 1 | \n",
" SRP253798 | \n",
" 10840 | \n",
"
\n",
" \n",
" 2 | \n",
" SRP253926 | \n",
" 2109 | \n",
"
\n",
" \n",
" 3 | \n",
" SRP276904 | \n",
" 1536 | \n",
"
\n",
" \n",
" 4 | \n",
" SRP266465 | \n",
" 1486 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" SRAStudy N\n",
"0 ERP121228 69982\n",
"1 SRP253798 10840\n",
"2 SRP253926 2109\n",
"3 SRP276904 1536\n",
"4 SRP266465 1486"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "NzYXpCD9kxla",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "e22723b0-32df-4062-d384-b5e6e9c868ab"
},
"source": [
"pysqldf('select BioProject, count(*) as N from ncbi group by BioProject order by N desc').head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJEB37886 | \n",
" 69982 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJNA613958 | \n",
" 10840 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA614995 | \n",
" 2109 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA655577 | \n",
" 1536 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJNA622837 | \n",
" 1486 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject N\n",
"0 PRJEB37886 69982\n",
"1 PRJNA613958 10840\n",
"2 PRJNA614995 2109\n",
"3 PRJNA655577 1536\n",
"4 PRJNA622837 1486"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "slEdrloZ4oEE"
},
"source": [
"top_rnaseq = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'ILLUMINA' and LibraryLayout = 'PAIRED' and LibraryStrategy = 'RNA-Seq' group by BioProject order by N desc limit 10\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OhZrJDvic63t"
},
"source": [
"top_amp_ill = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'ILLUMINA' and LibraryLayout = 'PAIRED' and LibraryStrategy = 'AMPLICON' group by BioProject order by N desc limit 10\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "DJeAjfuLc6-E"
},
"source": [
"top_ont_amp = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'OXFORD_NANOPORE' and LibraryStrategy = 'AMPLICON' group by BioProject order by N desc limit 10\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sHje2tH-ftjm",
"outputId": "5a9ac22c-eb0d-416f-f98b-4dcc3e6ea5d2"
},
"source": [
"print(pd.concat([top_rnaseq,top_amp_ill,top_ont_amp],axis=1).to_markdown(index=False))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"| BioProject | N | BioProject | N | BioProject | N |\n",
"|:-------------|-----:|:-------------|-------:|:-------------|------:|\n",
"| PRJNA622837 | 1564 | PRJEB37886 | 104984 | PRJEB37886 | 20968 |\n",
"| PRJNA612578 | 964 | PRJNA613958 | 14860 | PRJEB40277 | 1130 |\n",
"| PRJNA650245 | 617 | PRJNA614995 | 3967 | PRJEB39014 | 944 |\n",
"| PRJNA610428 | 42 | PRJNA645906 | 2286 | PRJEB38388 | 584 |\n",
"| PRJEB38546 | 26 | PRJNA639066 | 1931 | PRJEB39487 | 339 |\n",
"| PRJNA634356 | 25 | PRJNA625551 | 1163 | PRJNA669043 | 255 |\n",
"| PRJNA650134 | 22 | PRJNA656534 | 567 | PRJNA669553 | 228 |\n",
"| PRJNA661544 | 15 | PRJNA686984 | 543 | PRJNA650037 | 210 |\n",
"| PRJNA638211 | 10 | PRJEB38723 | 542 | PRJNA645970 | 173 |\n",
"| PRJNA605983 | 9 | PRJEB42024 | 539 | PRJNA610248 | 162 |\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D8Qb9Fvr4psL"
},
"source": [
"For the moment list restrict ourselves to Illumina data only that is in Paired library configuration and is not ampliconic"
]
},
{
"cell_type": "code",
"metadata": {
"id": "yj0S3_pJ4pKo",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "74740444-343e-47c7-a316-7b0d0cb42739"
},
"source": [
"pysqldf('select BioProject,count(*) as N from ncbi_il_pe_nonAmp group by BioProject order by N desc')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJNA631061 | \n",
" 2829 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJNA622837 | \n",
" 1564 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA612578 | \n",
" 964 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA650245 | \n",
" 864 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJEB37513 | \n",
" 244 | \n",
"
\n",
" \n",
" 5 | \n",
" PRJEB37886 | \n",
" 197 | \n",
"
\n",
" \n",
" 6 | \n",
" PRJEB39761 | \n",
" 193 | \n",
"
\n",
" \n",
" 7 | \n",
" PRJNA691556 | \n",
" 120 | \n",
"
\n",
" \n",
" 8 | \n",
" PRJNA667180 | \n",
" 102 | \n",
"
\n",
" \n",
" 9 | \n",
" PRJNA669945 | \n",
" 69 | \n",
"
\n",
" \n",
" 10 | \n",
" PRJNA675840 | \n",
" 66 | \n",
"
\n",
" \n",
" 11 | \n",
" PRJNA681020 | \n",
" 57 | \n",
"
\n",
" \n",
" 12 | \n",
" PRJNA683873 | \n",
" 57 | \n",
"
\n",
" \n",
" 13 | \n",
" PRJNA648306 | \n",
" 56 | \n",
"
\n",
" \n",
" 14 | \n",
" PRJNA649101 | \n",
" 48 | \n",
"
\n",
" \n",
" 15 | \n",
" PRJNA632475 | \n",
" 47 | \n",
"
\n",
" \n",
" 16 | \n",
" PRJNA639864 | \n",
" 46 | \n",
"
\n",
" \n",
" 17 | \n",
" PRJNA610428 | \n",
" 42 | \n",
"
\n",
" \n",
" 18 | \n",
" PRJNA687366 | \n",
" 34 | \n",
"
\n",
" \n",
" 19 | \n",
" PRJNA662684 | \n",
" 33 | \n",
"
\n",
" \n",
" 20 | \n",
" PRJNA645052 | \n",
" 30 | \n",
"
\n",
" \n",
" 21 | \n",
" PRJNA682223 | \n",
" 27 | \n",
"
\n",
" \n",
" 22 | \n",
" PRJEB38546 | \n",
" 26 | \n",
"
\n",
" \n",
" 23 | \n",
" PRJNA634356 | \n",
" 25 | \n",
"
\n",
" \n",
" 24 | \n",
" PRJNA659293 | \n",
" 23 | \n",
"
\n",
" \n",
" 25 | \n",
" PRJNA650134 | \n",
" 22 | \n",
"
\n",
" \n",
" 26 | \n",
" PRJNA645048 | \n",
" 21 | \n",
"
\n",
" \n",
" 27 | \n",
" PRJNA682013 | \n",
" 20 | \n",
"
\n",
" \n",
" 28 | \n",
" PRJNA645051 | \n",
" 17 | \n",
"
\n",
" \n",
" 29 | \n",
" PRJEB40188 | \n",
" 16 | \n",
"
\n",
" \n",
" 30 | \n",
" PRJNA661544 | \n",
" 15 | \n",
"
\n",
" \n",
" 31 | \n",
" PRJNA663402 | \n",
" 14 | \n",
"
\n",
" \n",
" 32 | \n",
" PRJNA682735 | \n",
" 14 | \n",
"
\n",
" \n",
" 33 | \n",
" PRJNA631042 | \n",
" 12 | \n",
"
\n",
" \n",
" 34 | \n",
" PRJNA666696 | \n",
" 12 | \n",
"
\n",
" \n",
" 35 | \n",
" PRJNA638211 | \n",
" 10 | \n",
"
\n",
" \n",
" 36 | \n",
" PRJNA692653 | \n",
" 10 | \n",
"
\n",
" \n",
" 37 | \n",
" PRJNA605983 | \n",
" 9 | \n",
"
\n",
" \n",
" 38 | \n",
" PRJNA605907 | \n",
" 8 | \n",
"
\n",
" \n",
" 39 | \n",
" PRJNA672811 | \n",
" 8 | \n",
"
\n",
" \n",
" 40 | \n",
" PRJNA616446 | \n",
" 7 | \n",
"
\n",
" \n",
" 41 | \n",
" PRJNA668631 | \n",
" 7 | \n",
"
\n",
" \n",
" 42 | \n",
" PRJNA679456 | \n",
" 7 | \n",
"
\n",
" \n",
" 43 | \n",
" PRJEB38101 | \n",
" 6 | \n",
"
\n",
" \n",
" 44 | \n",
" PRJEB38351 | \n",
" 6 | \n",
"
\n",
" \n",
" 45 | \n",
" PRJNA615319 | \n",
" 6 | \n",
"
\n",
" \n",
" 46 | \n",
" PRJNA627354 | \n",
" 6 | \n",
"
\n",
" \n",
" 47 | \n",
" PRJNA635443 | \n",
" 6 | \n",
"
\n",
" \n",
" 48 | \n",
" PRJEB39632 | \n",
" 5 | \n",
"
\n",
" \n",
" 49 | \n",
" PRJNA628043 | \n",
" 5 | \n",
"
\n",
" \n",
" 50 | \n",
" PRJNA639591 | \n",
" 5 | \n",
"
\n",
" \n",
" 51 | \n",
" PRJNA645906 | \n",
" 5 | \n",
"
\n",
" \n",
" 52 | \n",
" PRJNA673055 | \n",
" 5 | \n",
"
\n",
" \n",
" 53 | \n",
" PRJNA607948 | \n",
" 4 | \n",
"
\n",
" \n",
" 54 | \n",
" PRJNA627977 | \n",
" 4 | \n",
"
\n",
" \n",
" 55 | \n",
" PRJNA634194 | \n",
" 4 | \n",
"
\n",
" \n",
" 56 | \n",
" PRJNA636446 | \n",
" 4 | \n",
"
\n",
" \n",
" 57 | \n",
" PRJNA645342 | \n",
" 4 | \n",
"
\n",
" \n",
" 58 | \n",
" PRJNA623001 | \n",
" 3 | \n",
"
\n",
" \n",
" 59 | \n",
" PRJNA644357 | \n",
" 3 | \n",
"
\n",
" \n",
" 60 | \n",
" PRJNA668889 | \n",
" 3 | \n",
"
\n",
" \n",
" 61 | \n",
" PRJNA231221 | \n",
" 2 | \n",
"
\n",
" \n",
" 62 | \n",
" PRJNA624358 | \n",
" 2 | \n",
"
\n",
" \n",
" 63 | \n",
" PRJNA624792 | \n",
" 2 | \n",
"
\n",
" \n",
" 64 | \n",
" PRJNA643574 | \n",
" 2 | \n",
"
\n",
" \n",
" 65 | \n",
" PRJNA657893 | \n",
" 2 | \n",
"
\n",
" \n",
" 66 | \n",
" PRJNA663861 | \n",
" 2 | \n",
"
\n",
" \n",
" 67 | \n",
" PRJNA669553 | \n",
" 2 | \n",
"
\n",
" \n",
" 68 | \n",
" PRJNA674796 | \n",
" 2 | \n",
"
\n",
" \n",
" 69 | \n",
" PRJNA681038 | \n",
" 2 | \n",
"
\n",
" \n",
" 70 | \n",
" PRJEB38459 | \n",
" 1 | \n",
"
\n",
" \n",
" 71 | \n",
" PRJEB39737 | \n",
" 1 | \n",
"
\n",
" \n",
" 72 | \n",
" PRJEB41216 | \n",
" 1 | \n",
"
\n",
" \n",
" 73 | \n",
" PRJNA608651 | \n",
" 1 | \n",
"
\n",
" \n",
" 74 | \n",
" PRJNA623797 | \n",
" 1 | \n",
"
\n",
" \n",
" 75 | \n",
" PRJNA623895 | \n",
" 1 | \n",
"
\n",
" \n",
" 76 | \n",
" PRJNA624231 | \n",
" 1 | \n",
"
\n",
" \n",
" 77 | \n",
" PRJNA625669 | \n",
" 1 | \n",
"
\n",
" \n",
" 78 | \n",
" PRJNA626526 | \n",
" 1 | \n",
"
\n",
" \n",
" 79 | \n",
" PRJNA630716 | \n",
" 1 | \n",
"
\n",
" \n",
" 80 | \n",
" PRJNA633241 | \n",
" 1 | \n",
"
\n",
" \n",
" 81 | \n",
" PRJNA635017 | \n",
" 1 | \n",
"
\n",
" \n",
" 82 | \n",
" PRJNA636004 | \n",
" 1 | \n",
"
\n",
" \n",
" 83 | \n",
" PRJNA637892 | \n",
" 1 | \n",
"
\n",
" \n",
" 84 | \n",
" PRJNA647057 | \n",
" 1 | \n",
"
\n",
" \n",
" 85 | \n",
" PRJNA657938 | \n",
" 1 | \n",
"
\n",
" \n",
" 86 | \n",
" PRJNA657985 | \n",
" 1 | \n",
"
\n",
" \n",
" 87 | \n",
" PRJNA658211 | \n",
" 1 | \n",
"
\n",
" \n",
" 88 | \n",
" PRJNA658242 | \n",
" 1 | \n",
"
\n",
" \n",
" 89 | \n",
" PRJNA666189 | \n",
" 1 | \n",
"
\n",
" \n",
" 90 | \n",
" PRJNA679786 | \n",
" 1 | \n",
"
\n",
" \n",
" 91 | \n",
" PRJNA689000 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject N\n",
"0 PRJNA631061 2829\n",
"1 PRJNA622837 1564\n",
"2 PRJNA612578 964\n",
"3 PRJNA650245 864\n",
"4 PRJEB37513 244\n",
"5 PRJEB37886 197\n",
"6 PRJEB39761 193\n",
"7 PRJNA691556 120\n",
"8 PRJNA667180 102\n",
"9 PRJNA669945 69\n",
"10 PRJNA675840 66\n",
"11 PRJNA681020 57\n",
"12 PRJNA683873 57\n",
"13 PRJNA648306 56\n",
"14 PRJNA649101 48\n",
"15 PRJNA632475 47\n",
"16 PRJNA639864 46\n",
"17 PRJNA610428 42\n",
"18 PRJNA687366 34\n",
"19 PRJNA662684 33\n",
"20 PRJNA645052 30\n",
"21 PRJNA682223 27\n",
"22 PRJEB38546 26\n",
"23 PRJNA634356 25\n",
"24 PRJNA659293 23\n",
"25 PRJNA650134 22\n",
"26 PRJNA645048 21\n",
"27 PRJNA682013 20\n",
"28 PRJNA645051 17\n",
"29 PRJEB40188 16\n",
"30 PRJNA661544 15\n",
"31 PRJNA663402 14\n",
"32 PRJNA682735 14\n",
"33 PRJNA631042 12\n",
"34 PRJNA666696 12\n",
"35 PRJNA638211 10\n",
"36 PRJNA692653 10\n",
"37 PRJNA605983 9\n",
"38 PRJNA605907 8\n",
"39 PRJNA672811 8\n",
"40 PRJNA616446 7\n",
"41 PRJNA668631 7\n",
"42 PRJNA679456 7\n",
"43 PRJEB38101 6\n",
"44 PRJEB38351 6\n",
"45 PRJNA615319 6\n",
"46 PRJNA627354 6\n",
"47 PRJNA635443 6\n",
"48 PRJEB39632 5\n",
"49 PRJNA628043 5\n",
"50 PRJNA639591 5\n",
"51 PRJNA645906 5\n",
"52 PRJNA673055 5\n",
"53 PRJNA607948 4\n",
"54 PRJNA627977 4\n",
"55 PRJNA634194 4\n",
"56 PRJNA636446 4\n",
"57 PRJNA645342 4\n",
"58 PRJNA623001 3\n",
"59 PRJNA644357 3\n",
"60 PRJNA668889 3\n",
"61 PRJNA231221 2\n",
"62 PRJNA624358 2\n",
"63 PRJNA624792 2\n",
"64 PRJNA643574 2\n",
"65 PRJNA657893 2\n",
"66 PRJNA663861 2\n",
"67 PRJNA669553 2\n",
"68 PRJNA674796 2\n",
"69 PRJNA681038 2\n",
"70 PRJEB38459 1\n",
"71 PRJEB39737 1\n",
"72 PRJEB41216 1\n",
"73 PRJNA608651 1\n",
"74 PRJNA623797 1\n",
"75 PRJNA623895 1\n",
"76 PRJNA624231 1\n",
"77 PRJNA625669 1\n",
"78 PRJNA626526 1\n",
"79 PRJNA630716 1\n",
"80 PRJNA633241 1\n",
"81 PRJNA635017 1\n",
"82 PRJNA636004 1\n",
"83 PRJNA637892 1\n",
"84 PRJNA647057 1\n",
"85 PRJNA657938 1\n",
"86 PRJNA657985 1\n",
"87 PRJNA658211 1\n",
"88 PRJNA658242 1\n",
"89 PRJNA666189 1\n",
"90 PRJNA679786 1\n",
"91 PRJNA689000 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "fGMN6u0Bgc8l",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 564
},
"outputId": "43508bf2-3b7d-41cd-d396-0b42969b5d14"
},
"source": [
"pysqldf('select * from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\"').head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Run | \n",
" ReleaseDate | \n",
" LoadDate | \n",
" spots | \n",
" bases | \n",
" spots_with_mates | \n",
" avgLength | \n",
" size_MB | \n",
" AssemblyName | \n",
" download_path | \n",
" Experiment | \n",
" LibraryName | \n",
" LibraryStrategy | \n",
" LibrarySelection | \n",
" LibrarySource | \n",
" LibraryLayout | \n",
" InsertSize | \n",
" InsertDev | \n",
" Platform | \n",
" Model | \n",
" SRAStudy | \n",
" BioProject | \n",
" Study_Pubmed_id | \n",
" ProjectID | \n",
" Sample | \n",
" BioSample | \n",
" SampleType | \n",
" TaxID | \n",
" ScientificName | \n",
" SampleName | \n",
" g1k_pop_code | \n",
" source | \n",
" g1k_analysis_group | \n",
" Subject_ID | \n",
" Sex | \n",
" Disease | \n",
" Tumor | \n",
" Affection_Status | \n",
" Analyte_Type | \n",
" Histological_Type | \n",
" Body_Site | \n",
" CenterName | \n",
" Submission | \n",
" dbgap_study_accession | \n",
" Consent | \n",
" RunHash | \n",
" ReadHash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" SRR12733944 | \n",
" 2020-09-28 21:32:30 | \n",
" 2020-09-28 17:11:02 | \n",
" 838619 | \n",
" 169401038 | \n",
" 838619 | \n",
" 202 | \n",
" 54 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX9207340 | \n",
" SAMN15751626_ERCC-00162_SSIII_Random_Hexamers_... | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina NovaSeq 6000 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS7442251 | \n",
" SAMN15751626 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_DPH_00008 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1133927 | \n",
" None | \n",
" public | \n",
" 4DD156B9C9843DCE60F02F2499657030 | \n",
" BCE66AD150CA6DE46FCDE0276A29C7D4 | \n",
"
\n",
" \n",
" 1 | \n",
" SRR12733974 | \n",
" 2020-09-28 21:32:30 | \n",
" 2020-09-28 17:11:18 | \n",
" 1766330 | \n",
" 356798660 | \n",
" 1766330 | \n",
" 202 | \n",
" 121 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX9207310 | \n",
" SAMN15751630_ERCC-00022_RandomPrimer-SSIV_Next... | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina NovaSeq 6000 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS7442220 | \n",
" SAMN15751630 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_DPH_00012 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1133927 | \n",
" None | \n",
" public | \n",
" 23BDDDEB5E2875CA2CF1838AC088D99C | \n",
" 001DF0E80C7167DAB21F6DAB4B58EBFE | \n",
"
\n",
" \n",
" 2 | \n",
" SRR12733963 | \n",
" 2020-09-28 21:32:30 | \n",
" 2020-09-28 17:11:16 | \n",
" 2016051 | \n",
" 407242302 | \n",
" 2016051 | \n",
" 202 | \n",
" 137 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX9207321 | \n",
" SAMN15751631_ERCC-00042_RandomPrimer-SSIV_Next... | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina NovaSeq 6000 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS7442231 | \n",
" SAMN15751631 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_DPH_00013 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1133927 | \n",
" None | \n",
" public | \n",
" 4518EFAB47FC9AA6ED115887876ACAAC | \n",
" 94696946332E59F2316E819B16F8AAB2 | \n",
"
\n",
" \n",
" 3 | \n",
" SRR12733928 | \n",
" 2020-09-28 21:32:31 | \n",
" 2020-09-28 17:11:22 | \n",
" 2538944 | \n",
" 512866688 | \n",
" 2538944 | \n",
" 202 | \n",
" 178 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX9207356 | \n",
" SAMN15751632_ERCC-00061_RandomPrimer-SSIV_Next... | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina NovaSeq 6000 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS7442266 | \n",
" SAMN15751632 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_DPH_00014 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1133927 | \n",
" None | \n",
" public | \n",
" 0EDB75C66797A6F08A49823E14728F93 | \n",
" 7AEC7442D374D08AE76DC6EB65EF9D02 | \n",
"
\n",
" \n",
" 4 | \n",
" SRR12733917 | \n",
" 2020-09-28 21:32:31 | \n",
" 2020-09-28 17:11:13 | \n",
" 2442981 | \n",
" 493482162 | \n",
" 2442981 | \n",
" 202 | \n",
" 162 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX9207367 | \n",
" SAMN15751633_ERCC-00081_RandomPrimer-SSIV_Next... | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina NovaSeq 6000 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS7442279 | \n",
" SAMN15751633 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_DPH_00015 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1133927 | \n",
" None | \n",
" public | \n",
" DB050D2009449E5EB2C0787C0AAE7FA5 | \n",
" E1A7064BE27351B0E4EA94EBD94118F0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n",
"0 SRR12733944 2020-09-28 21:32:30 2020-09-28 17:11:02 838619 169401038 838619 202 54 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207340 SAMN15751626_ERCC-00162_SSIII_Random_Hexamers_... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442251 SAMN15751626 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00008 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 4DD156B9C9843DCE60F02F2499657030 BCE66AD150CA6DE46FCDE0276A29C7D4\n",
"1 SRR12733974 2020-09-28 21:32:30 2020-09-28 17:11:18 1766330 356798660 1766330 202 121 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207310 SAMN15751630_ERCC-00022_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442220 SAMN15751630 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00012 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 23BDDDEB5E2875CA2CF1838AC088D99C 001DF0E80C7167DAB21F6DAB4B58EBFE\n",
"2 SRR12733963 2020-09-28 21:32:30 2020-09-28 17:11:16 2016051 407242302 2016051 202 137 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207321 SAMN15751631_ERCC-00042_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442231 SAMN15751631 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00013 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 4518EFAB47FC9AA6ED115887876ACAAC 94696946332E59F2316E819B16F8AAB2\n",
"3 SRR12733928 2020-09-28 21:32:31 2020-09-28 17:11:22 2538944 512866688 2538944 202 178 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207356 SAMN15751632_ERCC-00061_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442266 SAMN15751632 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00014 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 0EDB75C66797A6F08A49823E14728F93 7AEC7442D374D08AE76DC6EB65EF9D02\n",
"4 SRR12733917 2020-09-28 21:32:31 2020-09-28 17:11:13 2442981 493482162 2442981 202 162 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207367 SAMN15751633_ERCC-00081_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442279 SAMN15751633 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00015 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public DB050D2009449E5EB2C0787C0AAE7FA5 E1A7064BE27351B0E4EA94EBD94118F0"
]
},
"metadata": {
"tags": []
},
"execution_count": 16
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "u2aqtENrpxZ6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 111
},
"outputId": "71e748ab-4553-4884-bc52-b3a0fdabbf84"
},
"source": [
"pysqldf('select Model, count(*) as N from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\" group by Model').head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Model | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Illumina HiSeq 2500 | \n",
" 60 | \n",
"
\n",
" \n",
" 1 | \n",
" Illumina NovaSeq 6000 | \n",
" 1426 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Model N\n",
"0 Illumina HiSeq 2500 60\n",
"1 Illumina NovaSeq 6000 1426"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "bteaPrVSqbFR",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "93f3822c-408b-484e-e54f-d37b9f48b0e2"
},
"source": [
"pysqldf('select SampleName, count(*) as N from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\" group by SampleName order by N desc').head(100)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" SampleName | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" MA_MGH_00001 | \n",
" 6 | \n",
"
\n",
" \n",
" 1 | \n",
" MA_MGH_00002 | \n",
" 6 | \n",
"
\n",
" \n",
" 2 | \n",
" MA_MGH_00003 | \n",
" 6 | \n",
"
\n",
" \n",
" 3 | \n",
" MA_MGH_00004 | \n",
" 6 | \n",
"
\n",
" \n",
" 4 | \n",
" MA_MGH_00005 | \n",
" 6 | \n",
"
\n",
" \n",
" 5 | \n",
" MA_MGH_00006 | \n",
" 6 | \n",
"
\n",
" \n",
" 6 | \n",
" MA_MGH_00007 | \n",
" 6 | \n",
"
\n",
" \n",
" 7 | \n",
" MA_MGH_00008 | \n",
" 6 | \n",
"
\n",
" \n",
" 8 | \n",
" MA_MGH_00009 | \n",
" 6 | \n",
"
\n",
" \n",
" 9 | \n",
" MA_MGH_00010 | \n",
" 6 | \n",
"
\n",
" \n",
" 10 | \n",
" MA_DPH_00001 | \n",
" 1 | \n",
"
\n",
" \n",
" 11 | \n",
" MA_DPH_00002 | \n",
" 1 | \n",
"
\n",
" \n",
" 12 | \n",
" MA_DPH_00003 | \n",
" 1 | \n",
"
\n",
" \n",
" 13 | \n",
" MA_DPH_00004 | \n",
" 1 | \n",
"
\n",
" \n",
" 14 | \n",
" MA_DPH_00005 | \n",
" 1 | \n",
"
\n",
" \n",
" 15 | \n",
" MA_DPH_00006 | \n",
" 1 | \n",
"
\n",
" \n",
" 16 | \n",
" MA_DPH_00007 | \n",
" 1 | \n",
"
\n",
" \n",
" 17 | \n",
" MA_DPH_00008 | \n",
" 1 | \n",
"
\n",
" \n",
" 18 | \n",
" MA_DPH_00009 | \n",
" 1 | \n",
"
\n",
" \n",
" 19 | \n",
" MA_DPH_00010 | \n",
" 1 | \n",
"
\n",
" \n",
" 20 | \n",
" MA_DPH_00011 | \n",
" 1 | \n",
"
\n",
" \n",
" 21 | \n",
" MA_DPH_00012 | \n",
" 1 | \n",
"
\n",
" \n",
" 22 | \n",
" MA_DPH_00013 | \n",
" 1 | \n",
"
\n",
" \n",
" 23 | \n",
" MA_DPH_00014 | \n",
" 1 | \n",
"
\n",
" \n",
" 24 | \n",
" MA_DPH_00015 | \n",
" 1 | \n",
"
\n",
" \n",
" 25 | \n",
" MA_DPH_00016 | \n",
" 1 | \n",
"
\n",
" \n",
" 26 | \n",
" MA_DPH_00017 | \n",
" 1 | \n",
"
\n",
" \n",
" 27 | \n",
" MA_DPH_00018 | \n",
" 1 | \n",
"
\n",
" \n",
" 28 | \n",
" MA_DPH_00019 | \n",
" 1 | \n",
"
\n",
" \n",
" 29 | \n",
" MA_DPH_00020 | \n",
" 1 | \n",
"
\n",
" \n",
" 30 | \n",
" MA_DPH_00021 | \n",
" 1 | \n",
"
\n",
" \n",
" 31 | \n",
" MA_DPH_00022 | \n",
" 1 | \n",
"
\n",
" \n",
" 32 | \n",
" MA_DPH_00023 | \n",
" 1 | \n",
"
\n",
" \n",
" 33 | \n",
" MA_DPH_00024 | \n",
" 1 | \n",
"
\n",
" \n",
" 34 | \n",
" MA_DPH_00025 | \n",
" 1 | \n",
"
\n",
" \n",
" 35 | \n",
" MA_DPH_00026 | \n",
" 1 | \n",
"
\n",
" \n",
" 36 | \n",
" MA_DPH_00027 | \n",
" 1 | \n",
"
\n",
" \n",
" 37 | \n",
" MA_DPH_00028 | \n",
" 1 | \n",
"
\n",
" \n",
" 38 | \n",
" MA_DPH_00029 | \n",
" 1 | \n",
"
\n",
" \n",
" 39 | \n",
" MA_DPH_00030 | \n",
" 1 | \n",
"
\n",
" \n",
" 40 | \n",
" MA_DPH_00031 | \n",
" 1 | \n",
"
\n",
" \n",
" 41 | \n",
" MA_DPH_00032 | \n",
" 1 | \n",
"
\n",
" \n",
" 42 | \n",
" MA_DPH_00033 | \n",
" 1 | \n",
"
\n",
" \n",
" 43 | \n",
" MA_DPH_00034 | \n",
" 1 | \n",
"
\n",
" \n",
" 44 | \n",
" MA_DPH_00035 | \n",
" 1 | \n",
"
\n",
" \n",
" 45 | \n",
" MA_DPH_00036 | \n",
" 1 | \n",
"
\n",
" \n",
" 46 | \n",
" MA_DPH_00037 | \n",
" 1 | \n",
"
\n",
" \n",
" 47 | \n",
" MA_DPH_00038 | \n",
" 1 | \n",
"
\n",
" \n",
" 48 | \n",
" MA_DPH_00039 | \n",
" 1 | \n",
"
\n",
" \n",
" 49 | \n",
" MA_DPH_00040 | \n",
" 1 | \n",
"
\n",
" \n",
" 50 | \n",
" MA_DPH_00041 | \n",
" 1 | \n",
"
\n",
" \n",
" 51 | \n",
" MA_DPH_00042 | \n",
" 1 | \n",
"
\n",
" \n",
" 52 | \n",
" MA_DPH_00043 | \n",
" 1 | \n",
"
\n",
" \n",
" 53 | \n",
" MA_DPH_00044 | \n",
" 1 | \n",
"
\n",
" \n",
" 54 | \n",
" MA_DPH_00045 | \n",
" 1 | \n",
"
\n",
" \n",
" 55 | \n",
" MA_DPH_00046 | \n",
" 1 | \n",
"
\n",
" \n",
" 56 | \n",
" MA_DPH_00047 | \n",
" 1 | \n",
"
\n",
" \n",
" 57 | \n",
" MA_DPH_00048 | \n",
" 1 | \n",
"
\n",
" \n",
" 58 | \n",
" MA_DPH_00049 | \n",
" 1 | \n",
"
\n",
" \n",
" 59 | \n",
" MA_DPH_00050 | \n",
" 1 | \n",
"
\n",
" \n",
" 60 | \n",
" MA_DPH_00051 | \n",
" 1 | \n",
"
\n",
" \n",
" 61 | \n",
" MA_DPH_00052 | \n",
" 1 | \n",
"
\n",
" \n",
" 62 | \n",
" MA_DPH_00053 | \n",
" 1 | \n",
"
\n",
" \n",
" 63 | \n",
" MA_DPH_00054 | \n",
" 1 | \n",
"
\n",
" \n",
" 64 | \n",
" MA_DPH_00055 | \n",
" 1 | \n",
"
\n",
" \n",
" 65 | \n",
" MA_DPH_00056 | \n",
" 1 | \n",
"
\n",
" \n",
" 66 | \n",
" MA_DPH_00057 | \n",
" 1 | \n",
"
\n",
" \n",
" 67 | \n",
" MA_DPH_00058 | \n",
" 1 | \n",
"
\n",
" \n",
" 68 | \n",
" MA_DPH_00060 | \n",
" 1 | \n",
"
\n",
" \n",
" 69 | \n",
" MA_DPH_00061 | \n",
" 1 | \n",
"
\n",
" \n",
" 70 | \n",
" MA_DPH_00062 | \n",
" 1 | \n",
"
\n",
" \n",
" 71 | \n",
" MA_DPH_00063 | \n",
" 1 | \n",
"
\n",
" \n",
" 72 | \n",
" MA_DPH_00064 | \n",
" 1 | \n",
"
\n",
" \n",
" 73 | \n",
" MA_DPH_00065 | \n",
" 1 | \n",
"
\n",
" \n",
" 74 | \n",
" MA_DPH_00066 | \n",
" 1 | \n",
"
\n",
" \n",
" 75 | \n",
" MA_DPH_00067 | \n",
" 1 | \n",
"
\n",
" \n",
" 76 | \n",
" MA_DPH_00068 | \n",
" 1 | \n",
"
\n",
" \n",
" 77 | \n",
" MA_DPH_00069 | \n",
" 1 | \n",
"
\n",
" \n",
" 78 | \n",
" MA_DPH_00071 | \n",
" 1 | \n",
"
\n",
" \n",
" 79 | \n",
" MA_DPH_00072 | \n",
" 1 | \n",
"
\n",
" \n",
" 80 | \n",
" MA_DPH_00073 | \n",
" 1 | \n",
"
\n",
" \n",
" 81 | \n",
" MA_DPH_00074 | \n",
" 1 | \n",
"
\n",
" \n",
" 82 | \n",
" MA_DPH_00075 | \n",
" 1 | \n",
"
\n",
" \n",
" 83 | \n",
" MA_DPH_00076 | \n",
" 1 | \n",
"
\n",
" \n",
" 84 | \n",
" MA_DPH_00077 | \n",
" 1 | \n",
"
\n",
" \n",
" 85 | \n",
" MA_DPH_00078 | \n",
" 1 | \n",
"
\n",
" \n",
" 86 | \n",
" MA_DPH_00079 | \n",
" 1 | \n",
"
\n",
" \n",
" 87 | \n",
" MA_DPH_00080 | \n",
" 1 | \n",
"
\n",
" \n",
" 88 | \n",
" MA_DPH_00081 | \n",
" 1 | \n",
"
\n",
" \n",
" 89 | \n",
" MA_DPH_00082 | \n",
" 1 | \n",
"
\n",
" \n",
" 90 | \n",
" MA_DPH_00083 | \n",
" 1 | \n",
"
\n",
" \n",
" 91 | \n",
" MA_DPH_00084 | \n",
" 1 | \n",
"
\n",
" \n",
" 92 | \n",
" MA_DPH_00085 | \n",
" 1 | \n",
"
\n",
" \n",
" 93 | \n",
" MA_DPH_00086 | \n",
" 1 | \n",
"
\n",
" \n",
" 94 | \n",
" MA_DPH_00087 | \n",
" 1 | \n",
"
\n",
" \n",
" 95 | \n",
" MA_DPH_00089 | \n",
" 1 | \n",
"
\n",
" \n",
" 96 | \n",
" MA_DPH_00090 | \n",
" 1 | \n",
"
\n",
" \n",
" 97 | \n",
" MA_DPH_00091 | \n",
" 1 | \n",
"
\n",
" \n",
" 98 | \n",
" MA_DPH_00092 | \n",
" 1 | \n",
"
\n",
" \n",
" 99 | \n",
" MA_DPH_00093 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" SampleName N\n",
"0 MA_MGH_00001 6\n",
"1 MA_MGH_00002 6\n",
"2 MA_MGH_00003 6\n",
"3 MA_MGH_00004 6\n",
"4 MA_MGH_00005 6\n",
"5 MA_MGH_00006 6\n",
"6 MA_MGH_00007 6\n",
"7 MA_MGH_00008 6\n",
"8 MA_MGH_00009 6\n",
"9 MA_MGH_00010 6\n",
"10 MA_DPH_00001 1\n",
"11 MA_DPH_00002 1\n",
"12 MA_DPH_00003 1\n",
"13 MA_DPH_00004 1\n",
"14 MA_DPH_00005 1\n",
"15 MA_DPH_00006 1\n",
"16 MA_DPH_00007 1\n",
"17 MA_DPH_00008 1\n",
"18 MA_DPH_00009 1\n",
"19 MA_DPH_00010 1\n",
"20 MA_DPH_00011 1\n",
"21 MA_DPH_00012 1\n",
"22 MA_DPH_00013 1\n",
"23 MA_DPH_00014 1\n",
"24 MA_DPH_00015 1\n",
"25 MA_DPH_00016 1\n",
"26 MA_DPH_00017 1\n",
"27 MA_DPH_00018 1\n",
"28 MA_DPH_00019 1\n",
"29 MA_DPH_00020 1\n",
"30 MA_DPH_00021 1\n",
"31 MA_DPH_00022 1\n",
"32 MA_DPH_00023 1\n",
"33 MA_DPH_00024 1\n",
"34 MA_DPH_00025 1\n",
"35 MA_DPH_00026 1\n",
"36 MA_DPH_00027 1\n",
"37 MA_DPH_00028 1\n",
"38 MA_DPH_00029 1\n",
"39 MA_DPH_00030 1\n",
"40 MA_DPH_00031 1\n",
"41 MA_DPH_00032 1\n",
"42 MA_DPH_00033 1\n",
"43 MA_DPH_00034 1\n",
"44 MA_DPH_00035 1\n",
"45 MA_DPH_00036 1\n",
"46 MA_DPH_00037 1\n",
"47 MA_DPH_00038 1\n",
"48 MA_DPH_00039 1\n",
"49 MA_DPH_00040 1\n",
"50 MA_DPH_00041 1\n",
"51 MA_DPH_00042 1\n",
"52 MA_DPH_00043 1\n",
"53 MA_DPH_00044 1\n",
"54 MA_DPH_00045 1\n",
"55 MA_DPH_00046 1\n",
"56 MA_DPH_00047 1\n",
"57 MA_DPH_00048 1\n",
"58 MA_DPH_00049 1\n",
"59 MA_DPH_00050 1\n",
"60 MA_DPH_00051 1\n",
"61 MA_DPH_00052 1\n",
"62 MA_DPH_00053 1\n",
"63 MA_DPH_00054 1\n",
"64 MA_DPH_00055 1\n",
"65 MA_DPH_00056 1\n",
"66 MA_DPH_00057 1\n",
"67 MA_DPH_00058 1\n",
"68 MA_DPH_00060 1\n",
"69 MA_DPH_00061 1\n",
"70 MA_DPH_00062 1\n",
"71 MA_DPH_00063 1\n",
"72 MA_DPH_00064 1\n",
"73 MA_DPH_00065 1\n",
"74 MA_DPH_00066 1\n",
"75 MA_DPH_00067 1\n",
"76 MA_DPH_00068 1\n",
"77 MA_DPH_00069 1\n",
"78 MA_DPH_00071 1\n",
"79 MA_DPH_00072 1\n",
"80 MA_DPH_00073 1\n",
"81 MA_DPH_00074 1\n",
"82 MA_DPH_00075 1\n",
"83 MA_DPH_00076 1\n",
"84 MA_DPH_00077 1\n",
"85 MA_DPH_00078 1\n",
"86 MA_DPH_00079 1\n",
"87 MA_DPH_00080 1\n",
"88 MA_DPH_00081 1\n",
"89 MA_DPH_00082 1\n",
"90 MA_DPH_00083 1\n",
"91 MA_DPH_00084 1\n",
"92 MA_DPH_00085 1\n",
"93 MA_DPH_00086 1\n",
"94 MA_DPH_00087 1\n",
"95 MA_DPH_00089 1\n",
"96 MA_DPH_00090 1\n",
"97 MA_DPH_00091 1\n",
"98 MA_DPH_00092 1\n",
"99 MA_DPH_00093 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "c_xq0nKcqnxK",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 663
},
"outputId": "334cdf3d-10d9-46af-a9d8-36082de9a845"
},
"source": [
"pysqldf('select * from ncbi_il_pe_nonAmp where SampleName=\"MA_MGH_00001\"')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Run | \n",
" ReleaseDate | \n",
" LoadDate | \n",
" spots | \n",
" bases | \n",
" spots_with_mates | \n",
" avgLength | \n",
" size_MB | \n",
" AssemblyName | \n",
" download_path | \n",
" Experiment | \n",
" LibraryName | \n",
" LibraryStrategy | \n",
" LibrarySelection | \n",
" LibrarySource | \n",
" LibraryLayout | \n",
" InsertSize | \n",
" InsertDev | \n",
" Platform | \n",
" Model | \n",
" SRAStudy | \n",
" BioProject | \n",
" Study_Pubmed_id | \n",
" ProjectID | \n",
" Sample | \n",
" BioSample | \n",
" SampleType | \n",
" TaxID | \n",
" ScientificName | \n",
" SampleName | \n",
" g1k_pop_code | \n",
" source | \n",
" g1k_analysis_group | \n",
" Subject_ID | \n",
" Sex | \n",
" Disease | \n",
" Tumor | \n",
" Affection_Status | \n",
" Analyte_Type | \n",
" Histological_Type | \n",
" Body_Site | \n",
" CenterName | \n",
" Submission | \n",
" dbgap_study_accession | \n",
" Consent | \n",
" RunHash | \n",
" ReadHash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" SRR11954059 | \n",
" 2020-06-08 14:49:46 | \n",
" 2020-06-08 14:40:42 | \n",
" 42969 | \n",
" 8679738 | \n",
" 42969 | \n",
" 202 | \n",
" 4 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8498570 | \n",
" SAMN14938611_xGen_3_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" AB5FB1C49D30050D7F36A91653D8D0DF | \n",
" 44AC9D5B9AB98C13C71163EBB4ECD7AD | \n",
"
\n",
" \n",
" 1 | \n",
" SRR11954164 | \n",
" 2020-06-08 14:52:39 | \n",
" 2020-06-08 14:41:48 | \n",
" 73987 | \n",
" 14945374 | \n",
" 73987 | \n",
" 202 | \n",
" 7 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8498465 | \n",
" SAMN14938611_xGen_2_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" 7BA7FA8B66CAA6DEEEBB8C11712C96B3 | \n",
" B437DB6AB394C49E0D4C79CF5F153CE6 | \n",
"
\n",
" \n",
" 2 | \n",
" SRR11953812 | \n",
" 2020-06-08 14:49:46 | \n",
" 2020-06-08 14:38:15 | \n",
" 300823 | \n",
" 60766246 | \n",
" 300823 | \n",
" 202 | \n",
" 29 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8498307 | \n",
" SAMN14938611_Next_2_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" E0168B308E9FF193D4C850E01BD038EA | \n",
" AB695CA8BBA75744D024DAFAD436F164 | \n",
"
\n",
" \n",
" 3 | \n",
" SRR11954281 | \n",
" 2020-06-08 14:52:37 | \n",
" 2020-06-08 14:42:54 | \n",
" 476168 | \n",
" 96185936 | \n",
" 476168 | \n",
" 202 | \n",
" 42 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8498349 | \n",
" SAMN14938611_Next_3_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" B8DACF8D30B342B6626A11336FAFB574 | \n",
" 2199CC4352E87B7D164769903E1593D9 | \n",
"
\n",
" \n",
" 4 | \n",
" SRR11953758 | \n",
" 2020-06-08 14:40:57 | \n",
" 2020-06-08 14:37:21 | \n",
" 227460 | \n",
" 45946920 | \n",
" 227460 | \n",
" 202 | \n",
" 20 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8497981 | \n",
" SAMN14938611_Next_1_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" E49F808F6B2715D0804D83AE1BFAFE6E | \n",
" 0E59993DF940CEF7B15ACD859BDAAF46 | \n",
"
\n",
" \n",
" 5 | \n",
" SRR11953779 | \n",
" 2020-06-08 14:49:42 | \n",
" 2020-06-08 14:37:40 | \n",
" 100368 | \n",
" 20274336 | \n",
" 100368 | \n",
" 202 | \n",
" 10 | \n",
" None | \n",
" https://sra-download.ncbi.nlm.nih.gov/traces/s... | \n",
" SRX8497960 | \n",
" SAMN14938611_xGen_1_ERCC-41 | \n",
" RNA-Seq | \n",
" cDNA | \n",
" VIRAL RNA | \n",
" PAIRED | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" Illumina HiSeq 2500 | \n",
" SRP266465 | \n",
" PRJNA622837 | \n",
" None | \n",
" 622837 | \n",
" SRS6796757 | \n",
" SAMN14938611 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" MA_MGH_00001 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" no | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" BROAD INSTITUTE OF HARVARD AND MIT | \n",
" SRA1084625 | \n",
" None | \n",
" public | \n",
" AD83F378D559074CA78EE17C95AA00F0 | \n",
" B0D6D7BD7174939AF6B7D0AE76A1C93A | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n",
"0 SRR11954059 2020-06-08 14:49:46 2020-06-08 14:40:42 42969 8679738 42969 202 4 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498570 SAMN14938611_xGen_3_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public AB5FB1C49D30050D7F36A91653D8D0DF 44AC9D5B9AB98C13C71163EBB4ECD7AD\n",
"1 SRR11954164 2020-06-08 14:52:39 2020-06-08 14:41:48 73987 14945374 73987 202 7 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498465 SAMN14938611_xGen_2_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public 7BA7FA8B66CAA6DEEEBB8C11712C96B3 B437DB6AB394C49E0D4C79CF5F153CE6\n",
"2 SRR11953812 2020-06-08 14:49:46 2020-06-08 14:38:15 300823 60766246 300823 202 29 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498307 SAMN14938611_Next_2_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public E0168B308E9FF193D4C850E01BD038EA AB695CA8BBA75744D024DAFAD436F164\n",
"3 SRR11954281 2020-06-08 14:52:37 2020-06-08 14:42:54 476168 96185936 476168 202 42 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498349 SAMN14938611_Next_3_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public B8DACF8D30B342B6626A11336FAFB574 2199CC4352E87B7D164769903E1593D9\n",
"4 SRR11953758 2020-06-08 14:40:57 2020-06-08 14:37:21 227460 45946920 227460 202 20 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8497981 SAMN14938611_Next_1_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public E49F808F6B2715D0804D83AE1BFAFE6E 0E59993DF940CEF7B15ACD859BDAAF46\n",
"5 SRR11953779 2020-06-08 14:49:42 2020-06-08 14:37:40 100368 20274336 100368 202 10 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8497960 SAMN14938611_xGen_1_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public AD83F378D559074CA78EE17C95AA00F0 B0D6D7BD7174939AF6B7D0AE76A1C93A"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "rjGKrzWaq6CV",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 513
},
"outputId": "73df4eca-f01f-4ccd-e9fa-9c4711aaf56e"
},
"source": [
"ncbi.head()\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Run | \n",
" ReleaseDate | \n",
" LoadDate | \n",
" spots | \n",
" bases | \n",
" spots_with_mates | \n",
" avgLength | \n",
" size_MB | \n",
" AssemblyName | \n",
" download_path | \n",
" Experiment | \n",
" LibraryName | \n",
" LibraryStrategy | \n",
" LibrarySelection | \n",
" LibrarySource | \n",
" LibraryLayout | \n",
" InsertSize | \n",
" InsertDev | \n",
" Platform | \n",
" Model | \n",
" SRAStudy | \n",
" BioProject | \n",
" Study_Pubmed_id | \n",
" ProjectID | \n",
" Sample | \n",
" BioSample | \n",
" SampleType | \n",
" TaxID | \n",
" ScientificName | \n",
" SampleName | \n",
" g1k_pop_code | \n",
" source | \n",
" g1k_analysis_group | \n",
" Subject_ID | \n",
" Sex | \n",
" Disease | \n",
" Tumor | \n",
" Affection_Status | \n",
" Analyte_Type | \n",
" Histological_Type | \n",
" Body_Site | \n",
" CenterName | \n",
" Submission | \n",
" dbgap_study_accession | \n",
" Consent | \n",
" RunHash | \n",
" ReadHash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ERR4694533 | \n",
" 2020-10-20 15:27:30 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
" NaN | \n",
" ERX4615619 | \n",
" NaN | \n",
" AMPLICON | \n",
" PCR | \n",
" VIRAL RNA | \n",
" SINGLE | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" unspecified | \n",
" ERP121228 | \n",
" PRJEB37886 | \n",
" NaN | \n",
" 629258 | \n",
" ERS5218687 | \n",
" SAMEA7460507 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" COG-UK/ALDP-9E7BBA | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" no | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" PUBLIC HEALTH ENGLAND (COLINDALE) | \n",
" ERA3005974 | \n",
" NaN | \n",
" public | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" ERR4694604 | \n",
" 2020-10-20 15:36:08 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
" NaN | \n",
" ERX4615689 | \n",
" NaN | \n",
" AMPLICON | \n",
" PCR | \n",
" VIRAL RNA | \n",
" SINGLE | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" unspecified | \n",
" ERP121228 | \n",
" PRJEB37886 | \n",
" NaN | \n",
" 629258 | \n",
" ERS5218756 | \n",
" SAMEA7460576 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" COG-UK/ALDP-9E85D9 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" no | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" PUBLIC HEALTH ENGLAND (COLINDALE) | \n",
" ERA3006200 | \n",
" NaN | \n",
" public | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" ERR4694586 | \n",
" 2020-10-20 15:27:31 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
" NaN | \n",
" ERX4615671 | \n",
" NaN | \n",
" AMPLICON | \n",
" PCR | \n",
" VIRAL RNA | \n",
" SINGLE | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" unspecified | \n",
" ERP121228 | \n",
" PRJEB37886 | \n",
" NaN | \n",
" 629258 | \n",
" ERS5218738 | \n",
" SAMEA7460558 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" COG-UK/ALDP-9E869A | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" no | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" PUBLIC HEALTH ENGLAND (COLINDALE) | \n",
" ERA3006136 | \n",
" NaN | \n",
" public | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" ERR4694593 | \n",
" 2020-10-20 15:27:31 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
" NaN | \n",
" ERX4615678 | \n",
" NaN | \n",
" AMPLICON | \n",
" PCR | \n",
" VIRAL RNA | \n",
" SINGLE | \n",
" 0 | \n",
" 0 | \n",
" ILLUMINA | \n",
" unspecified | \n",
" ERP121228 | \n",
" PRJEB37886 | \n",
" NaN | \n",
" 629258 | \n",
" ERS5218745 | \n",
" SAMEA7460565 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" COG-UK/ALDP-9E86E5 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" no | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" PUBLIC HEALTH ENGLAND (COLINDALE) | \n",
" ERA3006159 | \n",
" NaN | \n",
" public | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" ERR4694601 | \n",
" 2020-10-20 15:36:08 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
" NaN | \n",
" ERX4615686 | \n",
" NaN | \n",
" AMPLICON | \n",
" PCR | \n",
" VIRAL RNA | \n",
" SINGLE | \n",
" 0 | \n",
" 0 | \n",
" OXFORD_NANOPORE | \n",
" GridION | \n",
" ERP121228 | \n",
" PRJEB37886 | \n",
" NaN | \n",
" 629258 | \n",
" ERS5218753 | \n",
" SAMEA7460573 | \n",
" simple | \n",
" 2697049 | \n",
" Severe acute respiratory syndrome coronavirus 2 | \n",
" COG-UK/ALDP-9E8D3B | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" no | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" CENTRE FOR ENZYME INNOVATION, UNIVERSITY OF PO... | \n",
" ERA3006185 | \n",
" NaN | \n",
" public | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n",
"0 ERR4694533 2020-10-20 15:27:30 NaN 0 0 0 0 0 NaN NaN ERX4615619 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218687 SAMEA7460507 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E7BBA NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3005974 NaN public NaN NaN\n",
"1 ERR4694604 2020-10-20 15:36:08 NaN 0 0 0 0 0 NaN NaN ERX4615689 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218756 SAMEA7460576 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E85D9 NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006200 NaN public NaN NaN\n",
"2 ERR4694586 2020-10-20 15:27:31 NaN 0 0 0 0 0 NaN NaN ERX4615671 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218738 SAMEA7460558 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E869A NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006136 NaN public NaN NaN\n",
"3 ERR4694593 2020-10-20 15:27:31 NaN 0 0 0 0 0 NaN NaN ERX4615678 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218745 SAMEA7460565 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E86E5 NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006159 NaN public NaN NaN\n",
"4 ERR4694601 2020-10-20 15:36:08 NaN 0 0 0 0 0 NaN NaN ERX4615686 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 OXFORD_NANOPORE GridION ERP121228 PRJEB37886 NaN 629258 ERS5218753 SAMEA7460573 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E8D3B NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN CENTRE FOR ENZYME INNOVATION, UNIVERSITY OF PO... ERA3006185 NaN public NaN NaN"
]
},
"metadata": {
"tags": []
},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "rjIsbCMWb4GV",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "fdb953b0-3c36-43a9-c699-6dc039a21266"
},
"source": [
"pysqldf('select BioProject, count(*) as N from ncbi where Platform=\"ILLUMINA\" group by BioProject order by N')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BioProject | \n",
" N | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" PRJEB38459 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" PRJEB39737 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" PRJNA608651 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" PRJNA623683 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" PRJNA623797 | \n",
" 1 | \n",
"
\n",
" \n",
" 5 | \n",
" PRJNA623895 | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" PRJNA624231 | \n",
" 1 | \n",
"
\n",
" \n",
" 7 | \n",
" PRJNA625669 | \n",
" 1 | \n",
"
\n",
" \n",
" 8 | \n",
" PRJNA626526 | \n",
" 1 | \n",
"
\n",
" \n",
" 9 | \n",
" PRJNA633241 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" PRJNA635017 | \n",
" 1 | \n",
"
\n",
" \n",
" 11 | \n",
" PRJNA636004 | \n",
" 1 | \n",
"
\n",
" \n",
" 12 | \n",
" PRJNA637892 | \n",
" 1 | \n",
"
\n",
" \n",
" 13 | \n",
" PRJNA647057 | \n",
" 1 | \n",
"
\n",
" \n",
" 14 | \n",
" PRJNA657938 | \n",
" 1 | \n",
"
\n",
" \n",
" 15 | \n",
" PRJNA657985 | \n",
" 1 | \n",
"
\n",
" \n",
" 16 | \n",
" PRJNA658211 | \n",
" 1 | \n",
"
\n",
" \n",
" 17 | \n",
" PRJNA658242 | \n",
" 1 | \n",
"
\n",
" \n",
" 18 | \n",
" PRJNA666189 | \n",
" 1 | \n",
"
\n",
" \n",
" 19 | \n",
" PRJNA231221 | \n",
" 2 | \n",
"
\n",
" \n",
" 20 | \n",
" PRJNA624358 | \n",
" 2 | \n",
"
\n",
" \n",
" 21 | \n",
" PRJNA624792 | \n",
" 2 | \n",
"
\n",
" \n",
" 22 | \n",
" PRJNA629891 | \n",
" 2 | \n",
"
\n",
" \n",
" 23 | \n",
" PRJNA630716 | \n",
" 2 | \n",
"
\n",
" \n",
" 24 | \n",
" PRJNA643574 | \n",
" 2 | \n",
"
\n",
" \n",
" 25 | \n",
" PRJNA657893 | \n",
" 2 | \n",
"
\n",
" \n",
" 26 | \n",
" PRJNA667798 | \n",
" 2 | \n",
"
\n",
" \n",
" 27 | \n",
" PRJNA669553 | \n",
" 2 | \n",
"
\n",
" \n",
" 28 | \n",
" PRJNA670222 | \n",
" 2 | \n",
"
\n",
" \n",
" 29 | \n",
" PRJNA637285 | \n",
" 3 | \n",
"
\n",
" \n",
" 30 | \n",
" PRJNA644357 | \n",
" 3 | \n",
"
\n",
" \n",
" 31 | \n",
" PRJNA607948 | \n",
" 4 | \n",
"
\n",
" \n",
" 32 | \n",
" PRJNA616147 | \n",
" 4 | \n",
"
\n",
" \n",
" 33 | \n",
" PRJNA627977 | \n",
" 4 | \n",
"
\n",
" \n",
" 34 | \n",
" PRJNA634194 | \n",
" 4 | \n",
"
\n",
" \n",
" 35 | \n",
" PRJEB39632 | \n",
" 5 | \n",
"
\n",
" \n",
" 36 | \n",
" PRJNA628043 | \n",
" 5 | \n",
"
\n",
" \n",
" 37 | \n",
" PRJNA639591 | \n",
" 5 | \n",
"
\n",
" \n",
" 38 | \n",
" PRJEB38101 | \n",
" 6 | \n",
"
\n",
" \n",
" 39 | \n",
" PRJEB38351 | \n",
" 6 | \n",
"
\n",
" \n",
" 40 | \n",
" PRJNA615319 | \n",
" 6 | \n",
"
\n",
" \n",
" 41 | \n",
" PRJNA627354 | \n",
" 6 | \n",
"
\n",
" \n",
" 42 | \n",
" PRJNA647448 | \n",
" 7 | \n",
"
\n",
" \n",
" 43 | \n",
" PRJNA668631 | \n",
" 7 | \n",
"
\n",
" \n",
" 44 | \n",
" PRJNA605907 | \n",
" 8 | \n",
"
\n",
" \n",
" 45 | \n",
" PRJNA605983 | \n",
" 9 | \n",
"
\n",
" \n",
" 46 | \n",
" PRJNA662589 | \n",
" 9 | \n",
"
\n",
" \n",
" 47 | \n",
" PRJNA616446 | \n",
" 10 | \n",
"
\n",
" \n",
" 48 | \n",
" PRJNA638211 | \n",
" 10 | \n",
"
\n",
" \n",
" 49 | \n",
" PRJNA666543 | \n",
" 10 | \n",
"
\n",
" \n",
" 50 | \n",
" PRJNA627229 | \n",
" 11 | \n",
"
\n",
" \n",
" 51 | \n",
" PRJNA636446 | \n",
" 11 | \n",
"
\n",
" \n",
" 52 | \n",
" PRJNA666696 | \n",
" 12 | \n",
"
\n",
" \n",
" 53 | \n",
" PRJNA663402 | \n",
" 14 | \n",
"
\n",
" \n",
" 54 | \n",
" PRJNA645051 | \n",
" 17 | \n",
"
\n",
" \n",
" 55 | \n",
" PRJEB38369 | \n",
" 18 | \n",
"
\n",
" \n",
" 56 | \n",
" PRJNA622817 | \n",
" 18 | \n",
"
\n",
" \n",
" 57 | \n",
" PRJNA645048 | \n",
" 21 | \n",
"
\n",
" \n",
" 58 | \n",
" PRJNA650134 | \n",
" 22 | \n",
"
\n",
" \n",
" 59 | \n",
" PRJNA659293 | \n",
" 23 | \n",
"
\n",
" \n",
" 60 | \n",
" PRJNA614546 | \n",
" 24 | \n",
"
\n",
" \n",
" 61 | \n",
" PRJNA645052 | \n",
" 30 | \n",
"
\n",
" \n",
" 62 | \n",
" PRJNA662684 | \n",
" 33 | \n",
"
\n",
" \n",
" 63 | \n",
" PRJNA629889 | \n",
" 43 | \n",
"
\n",
" \n",
" 64 | \n",
" PRJNA639864 | \n",
" 46 | \n",
"
\n",
" \n",
" 65 | \n",
" PRJNA632475 | \n",
" 47 | \n",
"
\n",
" \n",
" 66 | \n",
" PRJNA631042 | \n",
" 49 | \n",
"
\n",
" \n",
" 67 | \n",
" PRJNA634119 | \n",
" 49 | \n",
"
\n",
" \n",
" 68 | \n",
" PRJEB39761 | \n",
" 50 | \n",
"
\n",
" \n",
" 69 | \n",
" PRJEB38546 | \n",
" 52 | \n",
"
\n",
" \n",
" 70 | \n",
" PRJNA648306 | \n",
" 56 | \n",
"
\n",
" \n",
" 71 | \n",
" PRJNA650037 | \n",
" 60 | \n",
"
\n",
" \n",
" 72 | \n",
" PRJNA634356 | \n",
" 67 | \n",
"
\n",
" \n",
" 73 | \n",
" PRJNA666219 | \n",
" 69 | \n",
"
\n",
" \n",
" 74 | \n",
" PRJNA649101 | \n",
" 72 | \n",
"
\n",
" \n",
" 75 | \n",
" PRJNA643575 | \n",
" 85 | \n",
"
\n",
" \n",
" 76 | \n",
" PRJNA667180 | \n",
" 102 | \n",
"
\n",
" \n",
" 77 | \n",
" PRJNA627662 | \n",
" 112 | \n",
"
\n",
" \n",
" 78 | \n",
" PRJNA639956 | \n",
" 164 | \n",
"
\n",
" \n",
" 79 | \n",
" PRJNA656695 | \n",
" 171 | \n",
"
\n",
" \n",
" 80 | \n",
" PRJNA633948 | \n",
" 204 | \n",
"
\n",
" \n",
" 81 | \n",
" PRJNA656534 | \n",
" 209 | \n",
"
\n",
" \n",
" 82 | \n",
" PRJNA647529 | \n",
" 212 | \n",
"
\n",
" \n",
" 83 | \n",
" PRJEB37513 | \n",
" 244 | \n",
"
\n",
" \n",
" 84 | \n",
" PRJNA631061 | \n",
" 254 | \n",
"
\n",
" \n",
" 85 | \n",
" PRJEB40443 | \n",
" 346 | \n",
"
\n",
" \n",
" 86 | \n",
" PRJNA662193 | \n",
" 400 | \n",
"
\n",
" \n",
" 87 | \n",
" PRJEB39887 | \n",
" 468 | \n",
"
\n",
" \n",
" 88 | \n",
" PRJNA636748 | \n",
" 516 | \n",
"
\n",
" \n",
" 89 | \n",
" PRJEB38723 | \n",
" 542 | \n",
"
\n",
" \n",
" 90 | \n",
" PRJNA639066 | \n",
" 696 | \n",
"
\n",
" \n",
" 91 | \n",
" PRJNA650245 | \n",
" 864 | \n",
"
\n",
" \n",
" 92 | \n",
" PRJNA625551 | \n",
" 904 | \n",
"
\n",
" \n",
" 93 | \n",
" PRJNA612578 | \n",
" 964 | \n",
"
\n",
" \n",
" 94 | \n",
" PRJNA610428 | \n",
" 1066 | \n",
"
\n",
" \n",
" 95 | \n",
" PRJNA645906 | \n",
" 1322 | \n",
"
\n",
" \n",
" 96 | \n",
" PRJNA622837 | \n",
" 1486 | \n",
"
\n",
" \n",
" 97 | \n",
" PRJNA655577 | \n",
" 1536 | \n",
"
\n",
" \n",
" 98 | \n",
" PRJNA614995 | \n",
" 1983 | \n",
"
\n",
" \n",
" 99 | \n",
" PRJNA613958 | \n",
" 10834 | \n",
"
\n",
" \n",
" 100 | \n",
" PRJEB37886 | \n",
" 56281 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BioProject N\n",
"0 PRJEB38459 1\n",
"1 PRJEB39737 1\n",
"2 PRJNA608651 1\n",
"3 PRJNA623683 1\n",
"4 PRJNA623797 1\n",
"5 PRJNA623895 1\n",
"6 PRJNA624231 1\n",
"7 PRJNA625669 1\n",
"8 PRJNA626526 1\n",
"9 PRJNA633241 1\n",
"10 PRJNA635017 1\n",
"11 PRJNA636004 1\n",
"12 PRJNA637892 1\n",
"13 PRJNA647057 1\n",
"14 PRJNA657938 1\n",
"15 PRJNA657985 1\n",
"16 PRJNA658211 1\n",
"17 PRJNA658242 1\n",
"18 PRJNA666189 1\n",
"19 PRJNA231221 2\n",
"20 PRJNA624358 2\n",
"21 PRJNA624792 2\n",
"22 PRJNA629891 2\n",
"23 PRJNA630716 2\n",
"24 PRJNA643574 2\n",
"25 PRJNA657893 2\n",
"26 PRJNA667798 2\n",
"27 PRJNA669553 2\n",
"28 PRJNA670222 2\n",
"29 PRJNA637285 3\n",
"30 PRJNA644357 3\n",
"31 PRJNA607948 4\n",
"32 PRJNA616147 4\n",
"33 PRJNA627977 4\n",
"34 PRJNA634194 4\n",
"35 PRJEB39632 5\n",
"36 PRJNA628043 5\n",
"37 PRJNA639591 5\n",
"38 PRJEB38101 6\n",
"39 PRJEB38351 6\n",
"40 PRJNA615319 6\n",
"41 PRJNA627354 6\n",
"42 PRJNA647448 7\n",
"43 PRJNA668631 7\n",
"44 PRJNA605907 8\n",
"45 PRJNA605983 9\n",
"46 PRJNA662589 9\n",
"47 PRJNA616446 10\n",
"48 PRJNA638211 10\n",
"49 PRJNA666543 10\n",
"50 PRJNA627229 11\n",
"51 PRJNA636446 11\n",
"52 PRJNA666696 12\n",
"53 PRJNA663402 14\n",
"54 PRJNA645051 17\n",
"55 PRJEB38369 18\n",
"56 PRJNA622817 18\n",
"57 PRJNA645048 21\n",
"58 PRJNA650134 22\n",
"59 PRJNA659293 23\n",
"60 PRJNA614546 24\n",
"61 PRJNA645052 30\n",
"62 PRJNA662684 33\n",
"63 PRJNA629889 43\n",
"64 PRJNA639864 46\n",
"65 PRJNA632475 47\n",
"66 PRJNA631042 49\n",
"67 PRJNA634119 49\n",
"68 PRJEB39761 50\n",
"69 PRJEB38546 52\n",
"70 PRJNA648306 56\n",
"71 PRJNA650037 60\n",
"72 PRJNA634356 67\n",
"73 PRJNA666219 69\n",
"74 PRJNA649101 72\n",
"75 PRJNA643575 85\n",
"76 PRJNA667180 102\n",
"77 PRJNA627662 112\n",
"78 PRJNA639956 164\n",
"79 PRJNA656695 171\n",
"80 PRJNA633948 204\n",
"81 PRJNA656534 209\n",
"82 PRJNA647529 212\n",
"83 PRJEB37513 244\n",
"84 PRJNA631061 254\n",
"85 PRJEB40443 346\n",
"86 PRJNA662193 400\n",
"87 PRJEB39887 468\n",
"88 PRJNA636748 516\n",
"89 PRJEB38723 542\n",
"90 PRJNA639066 696\n",
"91 PRJNA650245 864\n",
"92 PRJNA625551 904\n",
"93 PRJNA612578 964\n",
"94 PRJNA610428 1066\n",
"95 PRJNA645906 1322\n",
"96 PRJNA622837 1486\n",
"97 PRJNA655577 1536\n",
"98 PRJNA614995 1983\n",
"99 PRJNA613958 10834\n",
"100 PRJEB37886 56281"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "KuHE7HaQR7OA"
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}