{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "SRA_analysis", "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "ipPtJp7-b310" }, "source": [ "# Obtaining SRA metadata for SARS-CoV2 \n", "------\n", "Here we selecting \"best\" datasets for reanalysis using best-practice Galaxy SARS-CoV2 workflows. The first step is to go to https://www.ncbi.nlm.nih.gov/sra and perform a query with the following search terms: `txid2697049[Organism:noexp]`.\n", "\n", "Next, download serach results using `Send to:` menu selecting `File` qns then `RunInfo`. The resulting csv file is loaded into pandas below. " ] }, { "cell_type": "code", "metadata": { "id": "IcOEx6BKK2CQ" }, "source": [ "!pip3 install datapane" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "Zd_rF6Oi_J2r" }, "source": [ "# Set your datapane.com token here\n", "import datapane as dp\n", "dp.login(token=\"xxxxx\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "lax0gNVqM8pq" }, "source": [ "import pandas as pd\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "pd.set_option('display.width', 1000)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "txFIYbuNN-GT", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "90aa70a3-3ea9-4a2f-d206-34b06e1148ae" }, "source": [ "pip install -U pandasql" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Requirement already up-to-date: pandasql in /usr/local/lib/python3.6/dist-packages (0.7.3)\n", "Requirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.19.5)\n", "Requirement already satisfied, skipping upgrade: pandas in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.1.5)\n", "Requirement already satisfied, skipping upgrade: sqlalchemy in /usr/local/lib/python3.6/dist-packages (from pandasql) (1.3.23)\n", "Requirement already satisfied, skipping upgrade: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->pandasql) (2018.9)\n", "Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas->pandasql) (2.8.1)\n", "Requirement already satisfied, skipping upgrade: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.7.3->pandas->pandasql) (1.15.0)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "zgjRl2Iw9tJu" }, "source": [ "from pandasql import sqldf\n", "pysqldf = lambda q: sqldf(q, globals())" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "_NcO_3fu1a97" }, "source": [ "## Processing NCBI metadata\n", "\n", "The metedata is obtained directly from SRA website by selecting all SRA datasets for `txid` `2697049` and saving the results as `RunInfo` table, compressing it, and uploading to this notebook." ] }, { "cell_type": "code", "metadata": { "id": "7cqmUaLQAvKO" }, "source": [ "ncbi = pd.read_csv('https://github.com/galaxyproject/SARS-CoV-2/raw/master/data/var/SRA_Jan20_2021.csv.gz')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "HJL0oX7kmkoT", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "ac7c1fa7-8c1c-43fe-fce1-88808222ce69" }, "source": [ "print(ncbi.columns)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Index(['Run', 'ReleaseDate', 'LoadDate', 'spots', 'bases', 'spots_with_mates', 'avgLength', 'size_MB', 'AssemblyName', 'download_path', 'Experiment', 'LibraryName', 'LibraryStrategy', 'LibrarySelection', 'LibrarySource', 'LibraryLayout', 'InsertSize', 'InsertDev', 'Platform', 'Model', 'SRAStudy', 'BioProject', 'Study_Pubmed_id', 'ProjectID', 'Sample', 'BioSample', 'SampleType', 'TaxID', 'ScientificName', 'SampleName', 'g1k_pop_code', 'source', 'g1k_analysis_group', 'Subject_ID', 'Sex', 'Disease', 'Tumor', 'Affection_Status', 'Analyte_Type', 'Histological_Type', 'Body_Site', 'CenterName', 'Submission', 'dbgap_study_accession', 'Consent', 'RunHash', 'ReadHash'], dtype='object')\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "r9-AZsIOrdA7", "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "outputId": "cbf72a46-003a-4a07-d2b0-38fae610005d" }, "source": [ "pysqldf('select count(distinct BioProject) from ncbi')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(distinct BioProject)
0192
\n", "
" ], "text/plain": [ " count(distinct BioProject)\n", "0 192" ] }, "metadata": { "tags": [] }, "execution_count": 7 } ] }, { "cell_type": "code", "metadata": { "id": "4P98Fgojr60q", "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "outputId": "f8a257aa-a552-4a1e-a4f2-f62e2fb3157f" }, "source": [ "pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\"')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(distinct BioProject)
0149
\n", "
" ], "text/plain": [ " count(distinct BioProject)\n", "0 149" ] }, "metadata": { "tags": [] }, "execution_count": 8 } ] }, { "cell_type": "code", "metadata": { "id": "8JFDv0QgsG-i", "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "outputId": "9c4b88b1-4fd2-499a-ce6e-6f0262dc39e7" }, "source": [ "pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\"')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(distinct BioProject)
033
\n", "
" ], "text/plain": [ " count(distinct BioProject)\n", "0 33" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "code", "metadata": { "id": "nF8XjvAbs3Sv", "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "outputId": "c6578f49-a0ef-4297-d31b-d44710509e92" }, "source": [ "pysqldf('select count(*) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\"')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(*)
03351
\n", "
" ], "text/plain": [ " count(*)\n", "0 3351" ] }, "metadata": { "tags": [] }, "execution_count": 10 } ] }, { "cell_type": "code", "metadata": { "id": "XhFOvlI9sprE", "colab": { "base_uri": "https://localhost:8080/", "height": 80 }, "outputId": "c8de0270-ff0e-4e0b-8a78-4a0b7a89d1ec" }, "source": [ "pysqldf('select count(distinct BioProject) from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\"')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(distinct BioProject)
031
\n", "
" ], "text/plain": [ " count(distinct BioProject)\n", "0 31" ] }, "metadata": { "tags": [] }, "execution_count": 11 } ] }, { "cell_type": "code", "metadata": { "id": "1RWPQfg-tUot", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "2c2b46f3-7a97-4e9a-f7ee-37211cb8da99" }, "source": [ "pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"RNA-Seq\" and LibraryLayout=\"PAIRED\" group by BioProject order by N desc')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectReleaseDateN
0PRJNA6228372020-06-08 14:49:421564
1PRJNA6125782020-03-17 01:31:51964
2PRJNA6502452020-08-19 16:26:12617
3PRJNA6104282020-06-06 00:29:3142
4PRJEB385462020-10-17 18:53:3926
5PRJNA6343562020-09-27 17:31:3025
6PRJNA6501342020-08-01 14:16:3522
7PRJNA6615442020-11-19 20:02:3915
8PRJNA6382112020-07-31 11:46:0710
9PRJNA6059832020-02-15 11:40:119
10PRJNA6059072020-02-22 21:16:128
11PRJNA6164462020-04-02 00:08:417
12PRJNA6153192020-04-01 02:23:156
13PRJNA6395912020-07-07 00:15:175
14PRJNA6730552020-11-10 23:37:435
15PRJNA6341942020-05-20 22:28:374
16PRJNA6230012020-12-06 18:25:323
17PRJNA6443572020-09-01 17:09:183
18PRJNA6364462020-06-01 16:48:322
19PRJNA6435742020-07-02 07:59:172
20PRJNA6695532020-10-19 12:17:412
21PRJEB384592020-07-24 15:14:141
22PRJEB397372020-08-12 08:55:301
23PRJEB412162020-11-13 18:40:391
24PRJNA6086512020-02-25 09:00:401
25PRJNA6238952020-04-10 09:13:311
26PRJNA6242312020-05-06 17:28:091
27PRJNA6256692020-04-17 01:08:311
28PRJNA6307162020-09-09 19:43:231
29PRJNA6582422020-08-20 13:10:271
30PRJNA6890002021-01-01 16:17:281
\n", "
" ], "text/plain": [ " BioProject ReleaseDate N\n", "0 PRJNA622837 2020-06-08 14:49:42 1564\n", "1 PRJNA612578 2020-03-17 01:31:51 964\n", "2 PRJNA650245 2020-08-19 16:26:12 617\n", "3 PRJNA610428 2020-06-06 00:29:31 42\n", "4 PRJEB38546 2020-10-17 18:53:39 26\n", "5 PRJNA634356 2020-09-27 17:31:30 25\n", "6 PRJNA650134 2020-08-01 14:16:35 22\n", "7 PRJNA661544 2020-11-19 20:02:39 15\n", "8 PRJNA638211 2020-07-31 11:46:07 10\n", "9 PRJNA605983 2020-02-15 11:40:11 9\n", "10 PRJNA605907 2020-02-22 21:16:12 8\n", "11 PRJNA616446 2020-04-02 00:08:41 7\n", "12 PRJNA615319 2020-04-01 02:23:15 6\n", "13 PRJNA639591 2020-07-07 00:15:17 5\n", "14 PRJNA673055 2020-11-10 23:37:43 5\n", "15 PRJNA634194 2020-05-20 22:28:37 4\n", "16 PRJNA623001 2020-12-06 18:25:32 3\n", "17 PRJNA644357 2020-09-01 17:09:18 3\n", "18 PRJNA636446 2020-06-01 16:48:32 2\n", "19 PRJNA643574 2020-07-02 07:59:17 2\n", "20 PRJNA669553 2020-10-19 12:17:41 2\n", "21 PRJEB38459 2020-07-24 15:14:14 1\n", "22 PRJEB39737 2020-08-12 08:55:30 1\n", "23 PRJEB41216 2020-11-13 18:40:39 1\n", "24 PRJNA608651 2020-02-25 09:00:40 1\n", "25 PRJNA623895 2020-04-10 09:13:31 1\n", "26 PRJNA624231 2020-05-06 17:28:09 1\n", "27 PRJNA625669 2020-04-17 01:08:31 1\n", "28 PRJNA630716 2020-09-09 19:43:23 1\n", "29 PRJNA658242 2020-08-20 13:10:27 1\n", "30 PRJNA689000 2021-01-01 16:17:28 1" ] }, "metadata": { "tags": [] }, "execution_count": 12 } ] }, { "cell_type": "code", "metadata": { "id": "zgypVKryKqsJ", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "46834d00-5e8e-4817-a9a5-4d080d71b64f" }, "source": [ "pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"ILLUMINA\" and LibraryStrategy=\"AMPLICON\" and LibraryLayout=\"PAIRED\" group by BioProject order by ReleaseDate,N desc')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectReleaseDateN
0PRJNA6145462020-03-23 22:30:3324
1PRJNA6139582020-03-24 02:56:3814860
2PRJNA6149952020-03-24 19:53:133967
3PRJNA6228172020-04-05 03:54:1018
4PRJNA6236832020-04-09 04:58:141
5PRJNA6161472020-04-14 21:13:314
6PRJNA6255512020-04-17 16:26:151163
7PRJNA6272292020-04-23 05:01:4211
8PRJEB378862020-05-01 12:26:32104984
9PRJNA6298912020-05-02 02:12:352
10PRJNA6310422020-05-12 12:50:2537
11PRJNA6276622020-05-13 11:13:04112
12PRJNA6339482020-05-24 21:40:17204
13PRJNA6298892020-05-27 19:14:3518
14PRJNA6364462020-06-01 16:50:407
15PRJNA6341192020-06-15 21:00:3649
16PRJEB383692020-06-17 19:16:3218
17PRJEB387232020-06-21 15:16:39542
18PRJNA6390662020-06-26 11:25:401931
19PRJNA6367482020-06-29 14:57:15516
20PRJNA6435752020-07-07 08:37:4085
21PRJNA6475292020-07-21 04:20:18212
22PRJNA6474482020-07-21 12:31:357
23PRJNA6491012020-07-30 07:49:2824
24PRJNA6459062020-07-30 10:24:282286
25PRJNA6565342020-08-11 16:24:59567
26PRJNA6566952020-08-12 08:07:18171
27PRJNA6104282020-08-18 20:43:0820
28PRJNA6500372020-08-20 13:10:2260
29PRJNA6625892020-09-23 00:00:289
30PRJEB404432020-09-24 13:54:32346
31PRJNA6665432020-09-30 09:13:4916
32PRJEB398872020-10-05 09:34:44468
33PRJEB385462020-10-17 18:53:3926
34PRJNA6702222020-10-20 16:37:162
35PRJEB403942020-10-22 12:06:4048
36PRJEB398492020-10-30 14:40:23420
37PRJNA6730962020-11-01 13:09:09246
38PRJNA6733412020-11-01 16:14:277
39PRJEB401882020-11-03 16:12:5180
40PRJNA6794602020-11-19 13:16:02208
41PRJNA6812342020-11-28 02:21:139
42PRJNA6815742020-11-30 14:53:35151
43PRJNA6799802020-12-07 00:00:236
44PRJNA6860832020-12-20 07:35:4281
45PRJEB420242020-12-21 10:46:36539
46PRJNA6854002020-12-22 05:15:2328
47PRJNA6827352020-12-24 17:50:34315
48PRJNA6654852020-12-31 00:07:208
49PRJNA6695532020-12-31 00:11:13122
50PRJNA6698622021-01-01 04:16:208
51PRJNA6898112021-01-05 15:25:381
52PRJNA6869842021-01-06 19:44:09543
53PRJNA6286622021-01-11 13:39:294
54PRJNA6570322021-01-12 00:10:3271
55PRJNA6924722021-01-15 17:04:5547
\n", "
" ], "text/plain": [ " BioProject ReleaseDate N\n", "0 PRJNA614546 2020-03-23 22:30:33 24\n", "1 PRJNA613958 2020-03-24 02:56:38 14860\n", "2 PRJNA614995 2020-03-24 19:53:13 3967\n", "3 PRJNA622817 2020-04-05 03:54:10 18\n", "4 PRJNA623683 2020-04-09 04:58:14 1\n", "5 PRJNA616147 2020-04-14 21:13:31 4\n", "6 PRJNA625551 2020-04-17 16:26:15 1163\n", "7 PRJNA627229 2020-04-23 05:01:42 11\n", "8 PRJEB37886 2020-05-01 12:26:32 104984\n", "9 PRJNA629891 2020-05-02 02:12:35 2\n", "10 PRJNA631042 2020-05-12 12:50:25 37\n", "11 PRJNA627662 2020-05-13 11:13:04 112\n", "12 PRJNA633948 2020-05-24 21:40:17 204\n", "13 PRJNA629889 2020-05-27 19:14:35 18\n", "14 PRJNA636446 2020-06-01 16:50:40 7\n", "15 PRJNA634119 2020-06-15 21:00:36 49\n", "16 PRJEB38369 2020-06-17 19:16:32 18\n", "17 PRJEB38723 2020-06-21 15:16:39 542\n", "18 PRJNA639066 2020-06-26 11:25:40 1931\n", "19 PRJNA636748 2020-06-29 14:57:15 516\n", "20 PRJNA643575 2020-07-07 08:37:40 85\n", "21 PRJNA647529 2020-07-21 04:20:18 212\n", "22 PRJNA647448 2020-07-21 12:31:35 7\n", "23 PRJNA649101 2020-07-30 07:49:28 24\n", "24 PRJNA645906 2020-07-30 10:24:28 2286\n", "25 PRJNA656534 2020-08-11 16:24:59 567\n", "26 PRJNA656695 2020-08-12 08:07:18 171\n", "27 PRJNA610428 2020-08-18 20:43:08 20\n", "28 PRJNA650037 2020-08-20 13:10:22 60\n", "29 PRJNA662589 2020-09-23 00:00:28 9\n", "30 PRJEB40443 2020-09-24 13:54:32 346\n", "31 PRJNA666543 2020-09-30 09:13:49 16\n", "32 PRJEB39887 2020-10-05 09:34:44 468\n", "33 PRJEB38546 2020-10-17 18:53:39 26\n", "34 PRJNA670222 2020-10-20 16:37:16 2\n", "35 PRJEB40394 2020-10-22 12:06:40 48\n", "36 PRJEB39849 2020-10-30 14:40:23 420\n", "37 PRJNA673096 2020-11-01 13:09:09 246\n", "38 PRJNA673341 2020-11-01 16:14:27 7\n", "39 PRJEB40188 2020-11-03 16:12:51 80\n", "40 PRJNA679460 2020-11-19 13:16:02 208\n", "41 PRJNA681234 2020-11-28 02:21:13 9\n", "42 PRJNA681574 2020-11-30 14:53:35 151\n", "43 PRJNA679980 2020-12-07 00:00:23 6\n", "44 PRJNA686083 2020-12-20 07:35:42 81\n", "45 PRJEB42024 2020-12-21 10:46:36 539\n", "46 PRJNA685400 2020-12-22 05:15:23 28\n", "47 PRJNA682735 2020-12-24 17:50:34 315\n", "48 PRJNA665485 2020-12-31 00:07:20 8\n", "49 PRJNA669553 2020-12-31 00:11:13 122\n", "50 PRJNA669862 2021-01-01 04:16:20 8\n", "51 PRJNA689811 2021-01-05 15:25:38 1\n", "52 PRJNA686984 2021-01-06 19:44:09 543\n", "53 PRJNA628662 2021-01-11 13:39:29 4\n", "54 PRJNA657032 2021-01-12 00:10:32 71\n", "55 PRJNA692472 2021-01-15 17:04:55 47" ] }, "metadata": { "tags": [] }, "execution_count": 13 } ] }, { "cell_type": "code", "metadata": { "id": "A6pJykchKqGB", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "bd30abee-27b0-4456-ff74-2a38846f6e7f" }, "source": [ "pysqldf('select BioProject, ReleaseDate,count(*) as N from ncbi where Platform=\"OXFORD_NANOPORE\" and LibraryStrategy=\"AMPLICON\" group by BioProject order by ReleaseDate,N desc')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectReleaseDateN
0PRJNA6139582020-03-24 02:56:386
1PRJNA6149952020-03-24 20:18:34126
2PRJNA6228172020-04-05 04:02:535
3PRJNA6161472020-04-14 23:47:183
4PRJNA6272292020-04-23 05:01:4256
5PRJNA6102482020-04-28 22:04:43162
6PRJNA6149762020-04-29 20:43:5160
7PRJEB378862020-04-30 11:59:4620968
8PRJNA6326782020-05-14 10:09:401
9PRJEB383882020-05-21 12:11:53584
10PRJEB379662020-05-27 15:10:51123
11PRJNA6349652020-06-01 00:08:213
12PRJNA6286622020-07-06 09:38:1672
13PRJNA6457182020-07-13 12:02:155
14PRJNA6459702020-07-13 18:54:07173
15PRJEB394872020-07-23 18:43:53339
16PRJEB384592020-07-24 14:46:311
17PRJNA6491012020-07-30 07:49:2724
18PRJNA6406562020-08-06 10:36:1888
19PRJNA6500372020-08-13 12:33:55210
20PRJNA6584902020-08-21 12:57:541
21PRJNA6674342020-10-06 06:16:1210
22PRJEB407112020-10-13 08:22:4039
23PRJEB402772020-10-16 12:24:411130
24PRJNA6695532020-10-19 12:07:54228
25PRJNA6694592020-10-20 05:14:4515
26PRJNA6708242020-10-23 03:53:32101
27PRJNA6690432020-11-12 08:20:43255
28PRJNA6827352020-12-24 17:50:3418
29PRJNA6882082020-12-29 13:18:263
30PRJNA6869842020-12-30 16:57:411
31PRJEB390142021-01-06 11:33:41944
\n", "
" ], "text/plain": [ " BioProject ReleaseDate N\n", "0 PRJNA613958 2020-03-24 02:56:38 6\n", "1 PRJNA614995 2020-03-24 20:18:34 126\n", "2 PRJNA622817 2020-04-05 04:02:53 5\n", "3 PRJNA616147 2020-04-14 23:47:18 3\n", "4 PRJNA627229 2020-04-23 05:01:42 56\n", "5 PRJNA610248 2020-04-28 22:04:43 162\n", "6 PRJNA614976 2020-04-29 20:43:51 60\n", "7 PRJEB37886 2020-04-30 11:59:46 20968\n", "8 PRJNA632678 2020-05-14 10:09:40 1\n", "9 PRJEB38388 2020-05-21 12:11:53 584\n", "10 PRJEB37966 2020-05-27 15:10:51 123\n", "11 PRJNA634965 2020-06-01 00:08:21 3\n", "12 PRJNA628662 2020-07-06 09:38:16 72\n", "13 PRJNA645718 2020-07-13 12:02:15 5\n", "14 PRJNA645970 2020-07-13 18:54:07 173\n", "15 PRJEB39487 2020-07-23 18:43:53 339\n", "16 PRJEB38459 2020-07-24 14:46:31 1\n", "17 PRJNA649101 2020-07-30 07:49:27 24\n", "18 PRJNA640656 2020-08-06 10:36:18 88\n", "19 PRJNA650037 2020-08-13 12:33:55 210\n", "20 PRJNA658490 2020-08-21 12:57:54 1\n", "21 PRJNA667434 2020-10-06 06:16:12 10\n", "22 PRJEB40711 2020-10-13 08:22:40 39\n", "23 PRJEB40277 2020-10-16 12:24:41 1130\n", "24 PRJNA669553 2020-10-19 12:07:54 228\n", "25 PRJNA669459 2020-10-20 05:14:45 15\n", "26 PRJNA670824 2020-10-23 03:53:32 101\n", "27 PRJNA669043 2020-11-12 08:20:43 255\n", "28 PRJNA682735 2020-12-24 17:50:34 18\n", "29 PRJNA688208 2020-12-29 13:18:26 3\n", "30 PRJNA686984 2020-12-30 16:57:41 1\n", "31 PRJEB39014 2021-01-06 11:33:41 944" ] }, "metadata": { "tags": [] }, "execution_count": 14 } ] }, { "cell_type": "markdown", "metadata": { "id": "ZUzqjj6Ek2GR" }, "source": [ "Number of SRA runs by Library Strategy and Platform shows that Amplicon sequencing using Illumina is most abundant type of data:" ] }, { "cell_type": "code", "metadata": { "id": "k0oCDclt2SbW", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "0e738e80-132c-4b4f-dcfc-aa97211c9408" }, "source": [ "print(pysqldf('select LibraryStrategy, Platform, count(*) as N, count(distinct BioProject) as P from ncbi group by Platform, LibraryStrategy order by Platform asc, N desc').to_markdown(index=False))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "| LibraryStrategy | Platform | N | P |\n", "|:--------------------|:----------------|-------:|----:|\n", "| AMPLICON | BGISEQ | 21 | 1 |\n", "| RNA-Seq | BGISEQ | 1 | 1 |\n", "| WGA | BGISEQ | 1 | 1 |\n", "| AMPLICON | CAPILLARY | 7 | 1 |\n", "| AMPLICON | ILLUMINA | 149668 | 61 |\n", "| WGS | ILLUMINA | 6201 | 43 |\n", "| RNA-Seq | ILLUMINA | 4434 | 33 |\n", "| Targeted-Capture | ILLUMINA | 1690 | 11 |\n", "| WGA | ILLUMINA | 377 | 4 |\n", "| OTHER | ILLUMINA | 148 | 13 |\n", "| AMPLICON | ION_TORRENT | 435 | 7 |\n", "| RNA-Seq | ION_TORRENT | 42 | 4 |\n", "| WGS | ION_TORRENT | 33 | 6 |\n", "| AMPLICON | OXFORD_NANOPORE | 25754 | 32 |\n", "| WGS | OXFORD_NANOPORE | 936 | 12 |\n", "| WGA | OXFORD_NANOPORE | 580 | 3 |\n", "| RNA-Seq | OXFORD_NANOPORE | 10 | 5 |\n", "| OTHER | OXFORD_NANOPORE | 4 | 1 |\n", "| AMPLICON | PACBIO_SMRT | 12 | 1 |\n", "| Synthetic-Long-Read | PACBIO_SMRT | 2 | 1 |\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "CfmjR7LcstW5" }, "source": [ "counts = pysqldf('select LibraryStrategy, Platform, count(*) as N, count(distinct BioProject) as P from ncbi group by Platform, LibraryStrategy order by Platform asc, N desc')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 669 }, "id": "oJ-PLinsImnQ", "outputId": "e06f3499-eae8-4cdc-9877-32c2df9dea5e" }, "source": [ "counts" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LibraryStrategyPlatformNP
0AMPLICONBGISEQ211
1RNA-SeqBGISEQ11
2WGABGISEQ11
3AMPLICONCAPILLARY71
4AMPLICONILLUMINA14966861
5WGSILLUMINA620143
6RNA-SeqILLUMINA443433
7Targeted-CaptureILLUMINA169011
8WGAILLUMINA3774
9OTHERILLUMINA14813
10AMPLICONION_TORRENT4357
11RNA-SeqION_TORRENT424
12WGSION_TORRENT336
13AMPLICONOXFORD_NANOPORE2575432
14WGSOXFORD_NANOPORE93612
15WGAOXFORD_NANOPORE5803
16RNA-SeqOXFORD_NANOPORE105
17OTHEROXFORD_NANOPORE41
18AMPLICONPACBIO_SMRT121
19Synthetic-Long-ReadPACBIO_SMRT21
\n", "
" ], "text/plain": [ " LibraryStrategy Platform N P\n", "0 AMPLICON BGISEQ 21 1\n", "1 RNA-Seq BGISEQ 1 1\n", "2 WGA BGISEQ 1 1\n", "3 AMPLICON CAPILLARY 7 1\n", "4 AMPLICON ILLUMINA 149668 61\n", "5 WGS ILLUMINA 6201 43\n", "6 RNA-Seq ILLUMINA 4434 33\n", "7 Targeted-Capture ILLUMINA 1690 11\n", "8 WGA ILLUMINA 377 4\n", "9 OTHER ILLUMINA 148 13\n", "10 AMPLICON ION_TORRENT 435 7\n", "11 RNA-Seq ION_TORRENT 42 4\n", "12 WGS ION_TORRENT 33 6\n", "13 AMPLICON OXFORD_NANOPORE 25754 32\n", "14 WGS OXFORD_NANOPORE 936 12\n", "15 WGA OXFORD_NANOPORE 580 3\n", "16 RNA-Seq OXFORD_NANOPORE 10 5\n", "17 OTHER OXFORD_NANOPORE 4 1\n", "18 AMPLICON PACBIO_SMRT 12 1\n", "19 Synthetic-Long-Read PACBIO_SMRT 2 1" ] }, "metadata": { "tags": [] }, "execution_count": 17 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 534 }, "id": "W4f3U376m1US", "outputId": "4ff3b2ef-8d64-40f6-a074-a383552dd92b" }, "source": [ "import pandas as pd\n", "from math import pi\n", "import bokeh.io\n", "from bokeh.models import (BasicTicker, ColorBar, ColumnDataSource,\n", " LogColorMapper, PrintfTickFormatter,LinearColorMapper,ContinuousColorMapper,LogTicker)\n", "from bokeh.plotting import figure\n", "from bokeh.transform import transform\n", "from bokeh.palettes import viridis\n", "\n", "bokeh.io.output_notebook()\n", "\n", "source = ColumnDataSource(counts)\n", "colors = list(reversed(viridis(64)))\n", "\n", "mapper = LogColorMapper(palette=colors, low=counts['N'].min(), high=counts['N'].max())\n", "\n", "TOOLTIPS = [\n", " (\"SRA accessions\",\"@N\"),\n", " (\"BioProjects\",\"@P\")\n", "]\n", "\n", "p = figure(\n", " plot_width=600, \n", " plot_height=500, \n", " x_range=counts['LibraryStrategy'].unique(), \n", " y_range=counts['Platform'].unique(),\n", " x_axis_location=\"above\",\n", " tooltips=TOOLTIPS,\n", " tools='save',\n", " )\n", "p.rect(\n", " x=\"LibraryStrategy\", \n", " y=\"Platform\", \n", " width=1, \n", " height=1, \n", " source=source,\n", " line_color=None, \n", " fill_color=transform('N', mapper)\n", " )\n", "color_bar = ColorBar(\n", " color_mapper=mapper, \n", " location=(0, 0),\n", " ticker=LogTicker(),\n", " label_standoff=12,\n", " formatter=PrintfTickFormatter(format=\"%d\")\n", " )\n", "p.add_layout(color_bar, 'right')\n", "p.xaxis.major_label_orientation = pi/2\n", "p.axis.axis_line_color = None\n", "p.axis.major_tick_line_color = None\n", "p.ygrid.grid_line_color = None\n", "p.xgrid.grid_line_color = None\n", "\n", "try:\n", " bokeh.io.reset_output()\n", " bokeh.io.output_notebook()\n", " bokeh.io.show(p)\n", "except:\n", " bokeh.io.output_notebook()\n", " bokeh.io.show(p)\n", "\n", "r = dp.Report(\n", " dp.Plot(p)\n", ")" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "\u001b[32mConnected successfully to https://datapane.com as nekrut\u001b[0m\n" ], "name": "stdout" }, { "output_type": "display_data", "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(null);\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " \n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n \n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": { "tags": [] } }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": { "tags": [] } }, { "output_type": "display_data", "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"c07fda4c-7d75-4f8b-bcef-d0eb9b845f75\":{\"roots\":{\"references\":[{\"attributes\":{\"above\":[{\"id\":\"3706\"}],\"center\":[{\"id\":\"3708\"},{\"id\":\"3711\"}],\"left\":[{\"id\":\"3709\"}],\"plot_height\":500,\"renderers\":[{\"id\":\"3720\"}],\"right\":[{\"id\":\"3724\"}],\"title\":{\"id\":\"3728\"},\"toolbar\":{\"id\":\"3714\"},\"x_range\":{\"id\":\"3698\"},\"x_scale\":{\"id\":\"3702\"},\"y_range\":{\"id\":\"3700\"},\"y_scale\":{\"id\":\"3704\"}},\"id\":\"3697\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"data_source\":{\"id\":\"3695\"},\"glyph\":{\"id\":\"3718\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"3719\"},\"selection_glyph\":null,\"view\":{\"id\":\"3721\"}},\"id\":\"3720\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"factors\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"WGS\",\"Targeted-Capture\",\"OTHER\",\"Synthetic-Long-Read\"]},\"id\":\"3698\",\"type\":\"FactorRange\"},{\"attributes\":{\"callback\":null,\"tooltips\":[[\"SRA accessions\",\"@N\"],[\"BioProjects\",\"@P\"]]},\"id\":\"3713\",\"type\":\"HoverTool\"},{\"attributes\":{\"axis\":{\"id\":\"3706\"},\"grid_line_color\":null,\"ticker\":null},\"id\":\"3708\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"3722\",\"type\":\"LogTicker\"},{\"attributes\":{\"axis\":{\"id\":\"3709\"},\"dimension\":1,\"grid_line_color\":null,\"ticker\":null},\"id\":\"3711\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"3707\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"format\":\"%d\"},\"id\":\"3723\",\"type\":\"PrintfTickFormatter\"},{\"attributes\":{},\"id\":\"3702\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3730\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{},\"id\":\"3710\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3730\"},\"major_label_orientation\":1.5707963267948966,\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3707\"}},\"id\":\"3706\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"high\":149668,\"low\":1,\"palette\":[\"#FDE724\",\"#F1E51C\",\"#E7E419\",\"#DCE218\",\"#D2E11B\",\"#C7E01F\",\"#BDDE26\",\"#B2DD2C\",\"#A7DB33\",\"#9DD93A\",\"#92D741\",\"#88D547\",\"#7ED24E\",\"#74D054\",\"#6BCD59\",\"#62CA5F\",\"#59C764\",\"#51C468\",\"#49C16D\",\"#42BE71\",\"#3BBA75\",\"#35B778\",\"#2EB27C\",\"#29AF7F\",\"#25AB81\",\"#22A784\",\"#20A485\",\"#1EA087\",\"#1E9C89\",\"#1E998A\",\"#1F958B\",\"#20918C\",\"#218D8C\",\"#22898D\",\"#24868D\",\"#25828E\",\"#277E8E\",\"#287A8E\",\"#2A778E\",\"#2B738E\",\"#2D6F8E\",\"#2E6B8E\",\"#30678D\",\"#32628D\",\"#345E8D\",\"#365A8C\",\"#38568B\",\"#3A528B\",\"#3C4D8A\",\"#3E4989\",\"#404487\",\"#424085\",\"#433B83\",\"#453681\",\"#46317E\",\"#472C7B\",\"#472777\",\"#482273\",\"#481D6F\",\"#47186A\",\"#471265\",\"#460C5F\",\"#45065A\",\"#440154\"]},\"id\":\"3696\",\"type\":\"LogColorMapper\"},{\"attributes\":{},\"id\":\"3734\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3732\"},\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3710\"}},\"id\":\"3709\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"3712\"},{\"id\":\"3713\"}]},\"id\":\"3714\",\"type\":\"Toolbar\"},{\"attributes\":{\"color_mapper\":{\"id\":\"3696\"},\"formatter\":{\"id\":\"3723\"},\"label_standoff\":12,\"location\":[0,0],\"ticker\":{\"id\":\"3722\"}},\"id\":\"3724\",\"type\":\"ColorBar\"},{\"attributes\":{},\"id\":\"3712\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"3704\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3733\",\"type\":\"Selection\"},{\"attributes\":{\"factors\":[\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\"]},\"id\":\"3700\",\"type\":\"FactorRange\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3696\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3719\",\"type\":\"Rect\"},{\"attributes\":{\"source\":{\"id\":\"3695\"}},\"id\":\"3721\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"3732\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3696\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3718\",\"type\":\"Rect\"},{\"attributes\":{\"data\":{\"LibraryStrategy\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"AMPLICON\",\"AMPLICON\",\"WGS\",\"RNA-Seq\",\"Targeted-Capture\",\"WGA\",\"OTHER\",\"AMPLICON\",\"RNA-Seq\",\"WGS\",\"AMPLICON\",\"WGS\",\"WGA\",\"RNA-Seq\",\"OTHER\",\"AMPLICON\",\"Synthetic-Long-Read\"],\"N\":[21,1,1,7,149668,6201,4434,1690,377,148,435,42,33,25754,936,580,10,4,12,2],\"P\":[1,1,1,1,61,43,33,11,4,13,7,4,6,32,12,3,5,1,1,1],\"Platform\":[\"BGISEQ\",\"BGISEQ\",\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ION_TORRENT\",\"ION_TORRENT\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\",\"PACBIO_SMRT\"],\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]},\"selected\":{\"id\":\"3733\"},\"selection_policy\":{\"id\":\"3734\"}},\"id\":\"3695\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"text\":\"\"},\"id\":\"3728\",\"type\":\"Title\"}],\"root_ids\":[\"3697\"]},\"title\":\"Bokeh Application\",\"version\":\"2.0.0\"}};\n", " var render_items = [{\"docid\":\"c07fda4c-7d75-4f8b-bcef-d0eb9b845f75\",\"root_ids\":[\"3697\"],\"roots\":{\"3697\":\"96659d31-0192-4681-96d3-cb1de4e276ff\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "tags": [], "application/vnd.bokehjs_exec.v0+json": { "id": "3697" } } } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 317 }, "id": "ZxCgvj7KPP5v", "outputId": "49b1c64c-e148-4f61-fcdc-7bf6ffd31360" }, "source": [ "" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(null);\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " \n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n \n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.0.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.0.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": { "tags": [] } }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": { "tags": [] } }, { "output_type": "display_data", "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"c58613ee-1a5a-4479-99e2-441891cef0d6\":{\"roots\":{\"references\":[{\"attributes\":{\"above\":[{\"id\":\"3218\"}],\"center\":[{\"id\":\"3220\"},{\"id\":\"3223\"}],\"left\":[{\"id\":\"3221\"}],\"plot_height\":300,\"plot_width\":400,\"renderers\":[{\"id\":\"3232\"}],\"right\":[{\"id\":\"3236\"}],\"title\":{\"id\":\"3240\"},\"toolbar\":{\"id\":\"3226\"},\"toolbar_location\":null,\"x_range\":{\"id\":\"3210\"},\"x_scale\":{\"id\":\"3214\"},\"y_range\":{\"id\":\"3212\"},\"y_scale\":{\"id\":\"3216\"}},\"id\":\"3209\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"axis\":{\"id\":\"3221\"},\"dimension\":1,\"grid_line_color\":null,\"ticker\":null},\"id\":\"3223\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3208\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3231\",\"type\":\"Rect\"},{\"attributes\":{\"data_source\":{\"id\":\"3207\"},\"glyph\":{\"id\":\"3230\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"3231\"},\"selection_glyph\":null,\"view\":{\"id\":\"3233\"}},\"id\":\"3232\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"high\":149668,\"low\":1,\"palette\":[\"#FDE724\",\"#F1E51C\",\"#E7E419\",\"#DCE218\",\"#D2E11B\",\"#C7E01F\",\"#BDDE26\",\"#B2DD2C\",\"#A7DB33\",\"#9DD93A\",\"#92D741\",\"#88D547\",\"#7ED24E\",\"#74D054\",\"#6BCD59\",\"#62CA5F\",\"#59C764\",\"#51C468\",\"#49C16D\",\"#42BE71\",\"#3BBA75\",\"#35B778\",\"#2EB27C\",\"#29AF7F\",\"#25AB81\",\"#22A784\",\"#20A485\",\"#1EA087\",\"#1E9C89\",\"#1E998A\",\"#1F958B\",\"#20918C\",\"#218D8C\",\"#22898D\",\"#24868D\",\"#25828E\",\"#277E8E\",\"#287A8E\",\"#2A778E\",\"#2B738E\",\"#2D6F8E\",\"#2E6B8E\",\"#30678D\",\"#32628D\",\"#345E8D\",\"#365A8C\",\"#38568B\",\"#3A528B\",\"#3C4D8A\",\"#3E4989\",\"#404487\",\"#424085\",\"#433B83\",\"#453681\",\"#46317E\",\"#472C7B\",\"#472777\",\"#482273\",\"#481D6F\",\"#47186A\",\"#471265\",\"#460C5F\",\"#45065A\",\"#440154\"]},\"id\":\"3208\",\"type\":\"LogColorMapper\"},{\"attributes\":{\"source\":{\"id\":\"3207\"}},\"id\":\"3233\",\"type\":\"CDSView\"},{\"attributes\":{\"factors\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"WGS\",\"Targeted-Capture\",\"OTHER\",\"Synthetic-Long-Read\"]},\"id\":\"3210\",\"type\":\"FactorRange\"},{\"attributes\":{\"fill_color\":{\"field\":\"N\",\"transform\":{\"id\":\"3208\"}},\"height\":{\"units\":\"data\",\"value\":1},\"line_color\":{\"value\":null},\"width\":{\"units\":\"data\",\"value\":1},\"x\":{\"field\":\"LibraryStrategy\"},\"y\":{\"field\":\"Platform\"}},\"id\":\"3230\",\"type\":\"Rect\"},{\"attributes\":{},\"id\":\"3234\",\"type\":\"LogTicker\"},{\"attributes\":{},\"id\":\"3219\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"data\":{\"LibraryStrategy\":[\"AMPLICON\",\"RNA-Seq\",\"WGA\",\"AMPLICON\",\"AMPLICON\",\"WGS\",\"RNA-Seq\",\"Targeted-Capture\",\"WGA\",\"OTHER\",\"AMPLICON\",\"RNA-Seq\",\"WGS\",\"AMPLICON\",\"WGS\",\"WGA\",\"RNA-Seq\",\"OTHER\",\"AMPLICON\",\"Synthetic-Long-Read\"],\"N\":[21,1,1,7,149668,6201,4434,1690,377,148,435,42,33,25754,936,580,10,4,12,2],\"P\":[1,1,1,1,61,43,33,11,4,13,7,4,6,32,12,3,5,1,1,1],\"Platform\":[\"BGISEQ\",\"BGISEQ\",\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ILLUMINA\",\"ION_TORRENT\",\"ION_TORRENT\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\",\"PACBIO_SMRT\"],\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]},\"selected\":{\"id\":\"3245\"},\"selection_policy\":{\"id\":\"3246\"}},\"id\":\"3207\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"format\":\"%d\"},\"id\":\"3235\",\"type\":\"PrintfTickFormatter\"},{\"attributes\":{\"color_mapper\":{\"id\":\"3208\"},\"formatter\":{\"id\":\"3235\"},\"label_standoff\":12,\"location\":[0,0],\"ticker\":{\"id\":\"3234\"}},\"id\":\"3236\",\"type\":\"ColorBar\"},{\"attributes\":{},\"id\":\"3224\",\"type\":\"SaveTool\"},{\"attributes\":{\"factors\":[\"BGISEQ\",\"CAPILLARY\",\"ILLUMINA\",\"ION_TORRENT\",\"OXFORD_NANOPORE\",\"PACBIO_SMRT\"]},\"id\":\"3212\",\"type\":\"FactorRange\"},{\"attributes\":{},\"id\":\"3222\",\"type\":\"CategoricalTicker\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3244\"},\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3222\"}},\"id\":\"3221\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"axis_line_color\":null,\"formatter\":{\"id\":\"3242\"},\"major_label_orientation\":1.5707963267948966,\"major_tick_line_color\":null,\"ticker\":{\"id\":\"3219\"}},\"id\":\"3218\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"axis\":{\"id\":\"3218\"},\"grid_line_color\":null,\"ticker\":null},\"id\":\"3220\",\"type\":\"Grid\"},{\"attributes\":{\"text\":\"\"},\"id\":\"3240\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"3245\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"3242\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{},\"id\":\"3216\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"3244\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"3224\"},{\"id\":\"3225\"}]},\"id\":\"3226\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"3214\",\"type\":\"CategoricalScale\"},{\"attributes\":{\"callback\":null,\"tooltips\":[[\"SRA accessions\",\"@N\"],[\"BioProjects\",\"@P\"]]},\"id\":\"3225\",\"type\":\"HoverTool\"},{\"attributes\":{},\"id\":\"3246\",\"type\":\"UnionRenderers\"}],\"root_ids\":[\"3209\"]},\"title\":\"Bokeh Application\",\"version\":\"2.0.0\"}};\n", " var render_items = [{\"docid\":\"c58613ee-1a5a-4479-99e2-441891cef0d6\",\"root_ids\":[\"3209\"],\"roots\":{\"3209\":\"3e142481-48c2-4c2a-809e-8822fe0f9dc9\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "tags": [], "application/vnd.bokehjs_exec.v0+json": { "id": "3209" } } } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RCjkpNswK-2O", "outputId": "3de3d21a-e55d-4f56-e550-9f6401284cbb" }, "source": [ "# Deploy to datapane\n", "r.publish(name='SRA stats by Platform and Library Type', open=True)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Publishing report and associated data - please wait..\n", "Report successfully published at https://datapane.com/u/nekrut/reports/sra-stats-by-platform-and-library-type/\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "GADMAmNks6Ei", "colab": { "base_uri": "https://localhost:8080/", "height": 297 }, "outputId": "c3175c1b-5302-45ec-acc6-d78b43e56d9d" }, "source": [ "counts.pivot(index='LibraryStrategy',columns='Platform',values='N')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlatformBGISEQCAPILLARYILLUMINAION_TORRENTOXFORD_NANOPOREPACBIO_SMRT
LibraryStrategy
AMPLICON21.07.0149668.0435.025754.012.0
OTHERNaNNaN148.0NaN4.0NaN
RNA-Seq1.0NaN4434.042.010.0NaN
Synthetic-Long-ReadNaNNaNNaNNaNNaN2.0
Targeted-CaptureNaNNaN1690.0NaNNaNNaN
WGA1.0NaN377.0NaN580.0NaN
WGSNaNNaN6201.033.0936.0NaN
\n", "
" ], "text/plain": [ "Platform BGISEQ CAPILLARY ILLUMINA ION_TORRENT OXFORD_NANOPORE PACBIO_SMRT\n", "LibraryStrategy \n", "AMPLICON 21.0 7.0 149668.0 435.0 25754.0 12.0\n", "OTHER NaN NaN 148.0 NaN 4.0 NaN\n", "RNA-Seq 1.0 NaN 4434.0 42.0 10.0 NaN\n", "Synthetic-Long-Read NaN NaN NaN NaN NaN 2.0\n", "Targeted-Capture NaN NaN 1690.0 NaN NaN NaN\n", "WGA 1.0 NaN 377.0 NaN 580.0 NaN\n", "WGS NaN NaN 6201.0 33.0 936.0 NaN" ] }, "metadata": { "tags": [] }, "execution_count": 27 } ] }, { "cell_type": "markdown", "metadata": { "id": "UDkJYKd8nu4z" }, "source": [ "Individuals SRA runs are organized into SRAStudies or BioProjects:" ] }, { "cell_type": "code", "metadata": { "id": "bNvCsmIllQrG", "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "outputId": "fdcb6491-44ba-4f35-e0fc-a9159b7f8c81" }, "source": [ "pysqldf('select SRAStudy, count(*) as N from ncbi group by SRAStudy order by N desc').head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SRAStudyN
0ERP12122869982
1SRP25379810840
2SRP2539262109
3SRP2769041536
4SRP2664651486
\n", "
" ], "text/plain": [ " SRAStudy N\n", "0 ERP121228 69982\n", "1 SRP253798 10840\n", "2 SRP253926 2109\n", "3 SRP276904 1536\n", "4 SRP266465 1486" ] }, "metadata": { "tags": [] }, "execution_count": 10 } ] }, { "cell_type": "code", "metadata": { "id": "NzYXpCD9kxla", "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "outputId": "e22723b0-32df-4062-d384-b5e6e9c868ab" }, "source": [ "pysqldf('select BioProject, count(*) as N from ncbi group by BioProject order by N desc').head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectN
0PRJEB3788669982
1PRJNA61395810840
2PRJNA6149952109
3PRJNA6555771536
4PRJNA6228371486
\n", "
" ], "text/plain": [ " BioProject N\n", "0 PRJEB37886 69982\n", "1 PRJNA613958 10840\n", "2 PRJNA614995 2109\n", "3 PRJNA655577 1536\n", "4 PRJNA622837 1486" ] }, "metadata": { "tags": [] }, "execution_count": 11 } ] }, { "cell_type": "code", "metadata": { "id": "slEdrloZ4oEE" }, "source": [ "top_rnaseq = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'ILLUMINA' and LibraryLayout = 'PAIRED' and LibraryStrategy = 'RNA-Seq' group by BioProject order by N desc limit 10\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "OhZrJDvic63t" }, "source": [ "top_amp_ill = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'ILLUMINA' and LibraryLayout = 'PAIRED' and LibraryStrategy = 'AMPLICON' group by BioProject order by N desc limit 10\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "DJeAjfuLc6-E" }, "source": [ "top_ont_amp = pysqldf(\"select BioProject, count(*) as N from ncbi where Platform = 'OXFORD_NANOPORE' and LibraryStrategy = 'AMPLICON' group by BioProject order by N desc limit 10\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sHje2tH-ftjm", "outputId": "5a9ac22c-eb0d-416f-f98b-4dcc3e6ea5d2" }, "source": [ "print(pd.concat([top_rnaseq,top_amp_ill,top_ont_amp],axis=1).to_markdown(index=False))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "| BioProject | N | BioProject | N | BioProject | N |\n", "|:-------------|-----:|:-------------|-------:|:-------------|------:|\n", "| PRJNA622837 | 1564 | PRJEB37886 | 104984 | PRJEB37886 | 20968 |\n", "| PRJNA612578 | 964 | PRJNA613958 | 14860 | PRJEB40277 | 1130 |\n", "| PRJNA650245 | 617 | PRJNA614995 | 3967 | PRJEB39014 | 944 |\n", "| PRJNA610428 | 42 | PRJNA645906 | 2286 | PRJEB38388 | 584 |\n", "| PRJEB38546 | 26 | PRJNA639066 | 1931 | PRJEB39487 | 339 |\n", "| PRJNA634356 | 25 | PRJNA625551 | 1163 | PRJNA669043 | 255 |\n", "| PRJNA650134 | 22 | PRJNA656534 | 567 | PRJNA669553 | 228 |\n", "| PRJNA661544 | 15 | PRJNA686984 | 543 | PRJNA650037 | 210 |\n", "| PRJNA638211 | 10 | PRJEB38723 | 542 | PRJNA645970 | 173 |\n", "| PRJNA605983 | 9 | PRJEB42024 | 539 | PRJNA610248 | 162 |\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "D8Qb9Fvr4psL" }, "source": [ "For the moment list restrict ourselves to Illumina data only that is in Paired library configuration and is not ampliconic" ] }, { "cell_type": "code", "metadata": { "id": "yj0S3_pJ4pKo", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "74740444-343e-47c7-a316-7b0d0cb42739" }, "source": [ "pysqldf('select BioProject,count(*) as N from ncbi_il_pe_nonAmp group by BioProject order by N desc')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectN
0PRJNA6310612829
1PRJNA6228371564
2PRJNA612578964
3PRJNA650245864
4PRJEB37513244
5PRJEB37886197
6PRJEB39761193
7PRJNA691556120
8PRJNA667180102
9PRJNA66994569
10PRJNA67584066
11PRJNA68102057
12PRJNA68387357
13PRJNA64830656
14PRJNA64910148
15PRJNA63247547
16PRJNA63986446
17PRJNA61042842
18PRJNA68736634
19PRJNA66268433
20PRJNA64505230
21PRJNA68222327
22PRJEB3854626
23PRJNA63435625
24PRJNA65929323
25PRJNA65013422
26PRJNA64504821
27PRJNA68201320
28PRJNA64505117
29PRJEB4018816
30PRJNA66154415
31PRJNA66340214
32PRJNA68273514
33PRJNA63104212
34PRJNA66669612
35PRJNA63821110
36PRJNA69265310
37PRJNA6059839
38PRJNA6059078
39PRJNA6728118
40PRJNA6164467
41PRJNA6686317
42PRJNA6794567
43PRJEB381016
44PRJEB383516
45PRJNA6153196
46PRJNA6273546
47PRJNA6354436
48PRJEB396325
49PRJNA6280435
50PRJNA6395915
51PRJNA6459065
52PRJNA6730555
53PRJNA6079484
54PRJNA6279774
55PRJNA6341944
56PRJNA6364464
57PRJNA6453424
58PRJNA6230013
59PRJNA6443573
60PRJNA6688893
61PRJNA2312212
62PRJNA6243582
63PRJNA6247922
64PRJNA6435742
65PRJNA6578932
66PRJNA6638612
67PRJNA6695532
68PRJNA6747962
69PRJNA6810382
70PRJEB384591
71PRJEB397371
72PRJEB412161
73PRJNA6086511
74PRJNA6237971
75PRJNA6238951
76PRJNA6242311
77PRJNA6256691
78PRJNA6265261
79PRJNA6307161
80PRJNA6332411
81PRJNA6350171
82PRJNA6360041
83PRJNA6378921
84PRJNA6470571
85PRJNA6579381
86PRJNA6579851
87PRJNA6582111
88PRJNA6582421
89PRJNA6661891
90PRJNA6797861
91PRJNA6890001
\n", "
" ], "text/plain": [ " BioProject N\n", "0 PRJNA631061 2829\n", "1 PRJNA622837 1564\n", "2 PRJNA612578 964\n", "3 PRJNA650245 864\n", "4 PRJEB37513 244\n", "5 PRJEB37886 197\n", "6 PRJEB39761 193\n", "7 PRJNA691556 120\n", "8 PRJNA667180 102\n", "9 PRJNA669945 69\n", "10 PRJNA675840 66\n", "11 PRJNA681020 57\n", "12 PRJNA683873 57\n", "13 PRJNA648306 56\n", "14 PRJNA649101 48\n", "15 PRJNA632475 47\n", "16 PRJNA639864 46\n", "17 PRJNA610428 42\n", "18 PRJNA687366 34\n", "19 PRJNA662684 33\n", "20 PRJNA645052 30\n", "21 PRJNA682223 27\n", "22 PRJEB38546 26\n", "23 PRJNA634356 25\n", "24 PRJNA659293 23\n", "25 PRJNA650134 22\n", "26 PRJNA645048 21\n", "27 PRJNA682013 20\n", "28 PRJNA645051 17\n", "29 PRJEB40188 16\n", "30 PRJNA661544 15\n", "31 PRJNA663402 14\n", "32 PRJNA682735 14\n", "33 PRJNA631042 12\n", "34 PRJNA666696 12\n", "35 PRJNA638211 10\n", "36 PRJNA692653 10\n", "37 PRJNA605983 9\n", "38 PRJNA605907 8\n", "39 PRJNA672811 8\n", "40 PRJNA616446 7\n", "41 PRJNA668631 7\n", "42 PRJNA679456 7\n", "43 PRJEB38101 6\n", "44 PRJEB38351 6\n", "45 PRJNA615319 6\n", "46 PRJNA627354 6\n", "47 PRJNA635443 6\n", "48 PRJEB39632 5\n", "49 PRJNA628043 5\n", "50 PRJNA639591 5\n", "51 PRJNA645906 5\n", "52 PRJNA673055 5\n", "53 PRJNA607948 4\n", "54 PRJNA627977 4\n", "55 PRJNA634194 4\n", "56 PRJNA636446 4\n", "57 PRJNA645342 4\n", "58 PRJNA623001 3\n", "59 PRJNA644357 3\n", "60 PRJNA668889 3\n", "61 PRJNA231221 2\n", "62 PRJNA624358 2\n", "63 PRJNA624792 2\n", "64 PRJNA643574 2\n", "65 PRJNA657893 2\n", "66 PRJNA663861 2\n", "67 PRJNA669553 2\n", "68 PRJNA674796 2\n", "69 PRJNA681038 2\n", "70 PRJEB38459 1\n", "71 PRJEB39737 1\n", "72 PRJEB41216 1\n", "73 PRJNA608651 1\n", "74 PRJNA623797 1\n", "75 PRJNA623895 1\n", "76 PRJNA624231 1\n", "77 PRJNA625669 1\n", "78 PRJNA626526 1\n", "79 PRJNA630716 1\n", "80 PRJNA633241 1\n", "81 PRJNA635017 1\n", "82 PRJNA636004 1\n", "83 PRJNA637892 1\n", "84 PRJNA647057 1\n", "85 PRJNA657938 1\n", "86 PRJNA657985 1\n", "87 PRJNA658211 1\n", "88 PRJNA658242 1\n", "89 PRJNA666189 1\n", "90 PRJNA679786 1\n", "91 PRJNA689000 1" ] }, "metadata": { "tags": [] }, "execution_count": 24 } ] }, { "cell_type": "code", "metadata": { "id": "fGMN6u0Bgc8l", "colab": { "base_uri": "https://localhost:8080/", "height": 564 }, "outputId": "43508bf2-3b7d-41cd-d396-0b42969b5d14" }, "source": [ "pysqldf('select * from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\"').head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RunReleaseDateLoadDatespotsbasesspots_with_matesavgLengthsize_MBAssemblyNamedownload_pathExperimentLibraryNameLibraryStrategyLibrarySelectionLibrarySourceLibraryLayoutInsertSizeInsertDevPlatformModelSRAStudyBioProjectStudy_Pubmed_idProjectIDSampleBioSampleSampleTypeTaxIDScientificNameSampleNameg1k_pop_codesourceg1k_analysis_groupSubject_IDSexDiseaseTumorAffection_StatusAnalyte_TypeHistological_TypeBody_SiteCenterNameSubmissiondbgap_study_accessionConsentRunHashReadHash
0SRR127339442020-09-28 21:32:302020-09-28 17:11:0283861916940103883861920254Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX9207340SAMN15751626_ERCC-00162_SSIII_Random_Hexamers_...RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina NovaSeq 6000SRP266465PRJNA622837None622837SRS7442251SAMN15751626simple2697049Severe acute respiratory syndrome coronavirus 2MA_DPH_00008NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1133927Nonepublic4DD156B9C9843DCE60F02F2499657030BCE66AD150CA6DE46FCDE0276A29C7D4
1SRR127339742020-09-28 21:32:302020-09-28 17:11:1817663303567986601766330202121Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX9207310SAMN15751630_ERCC-00022_RandomPrimer-SSIV_Next...RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina NovaSeq 6000SRP266465PRJNA622837None622837SRS7442220SAMN15751630simple2697049Severe acute respiratory syndrome coronavirus 2MA_DPH_00012NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1133927Nonepublic23BDDDEB5E2875CA2CF1838AC088D99C001DF0E80C7167DAB21F6DAB4B58EBFE
2SRR127339632020-09-28 21:32:302020-09-28 17:11:1620160514072423022016051202137Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX9207321SAMN15751631_ERCC-00042_RandomPrimer-SSIV_Next...RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina NovaSeq 6000SRP266465PRJNA622837None622837SRS7442231SAMN15751631simple2697049Severe acute respiratory syndrome coronavirus 2MA_DPH_00013NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1133927Nonepublic4518EFAB47FC9AA6ED115887876ACAAC94696946332E59F2316E819B16F8AAB2
3SRR127339282020-09-28 21:32:312020-09-28 17:11:2225389445128666882538944202178Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX9207356SAMN15751632_ERCC-00061_RandomPrimer-SSIV_Next...RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina NovaSeq 6000SRP266465PRJNA622837None622837SRS7442266SAMN15751632simple2697049Severe acute respiratory syndrome coronavirus 2MA_DPH_00014NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1133927Nonepublic0EDB75C66797A6F08A49823E14728F937AEC7442D374D08AE76DC6EB65EF9D02
4SRR127339172020-09-28 21:32:312020-09-28 17:11:1324429814934821622442981202162Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX9207367SAMN15751633_ERCC-00081_RandomPrimer-SSIV_Next...RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina NovaSeq 6000SRP266465PRJNA622837None622837SRS7442279SAMN15751633simple2697049Severe acute respiratory syndrome coronavirus 2MA_DPH_00015NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1133927NonepublicDB050D2009449E5EB2C0787C0AAE7FA5E1A7064BE27351B0E4EA94EBD94118F0
\n", "
" ], "text/plain": [ " Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n", "0 SRR12733944 2020-09-28 21:32:30 2020-09-28 17:11:02 838619 169401038 838619 202 54 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207340 SAMN15751626_ERCC-00162_SSIII_Random_Hexamers_... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442251 SAMN15751626 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00008 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 4DD156B9C9843DCE60F02F2499657030 BCE66AD150CA6DE46FCDE0276A29C7D4\n", "1 SRR12733974 2020-09-28 21:32:30 2020-09-28 17:11:18 1766330 356798660 1766330 202 121 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207310 SAMN15751630_ERCC-00022_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442220 SAMN15751630 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00012 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 23BDDDEB5E2875CA2CF1838AC088D99C 001DF0E80C7167DAB21F6DAB4B58EBFE\n", "2 SRR12733963 2020-09-28 21:32:30 2020-09-28 17:11:16 2016051 407242302 2016051 202 137 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207321 SAMN15751631_ERCC-00042_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442231 SAMN15751631 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00013 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 4518EFAB47FC9AA6ED115887876ACAAC 94696946332E59F2316E819B16F8AAB2\n", "3 SRR12733928 2020-09-28 21:32:31 2020-09-28 17:11:22 2538944 512866688 2538944 202 178 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207356 SAMN15751632_ERCC-00061_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442266 SAMN15751632 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00014 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public 0EDB75C66797A6F08A49823E14728F93 7AEC7442D374D08AE76DC6EB65EF9D02\n", "4 SRR12733917 2020-09-28 21:32:31 2020-09-28 17:11:13 2442981 493482162 2442981 202 162 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX9207367 SAMN15751633_ERCC-00081_RandomPrimer-SSIV_Next... RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina NovaSeq 6000 SRP266465 PRJNA622837 None 622837 SRS7442279 SAMN15751633 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_DPH_00015 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1133927 None public DB050D2009449E5EB2C0787C0AAE7FA5 E1A7064BE27351B0E4EA94EBD94118F0" ] }, "metadata": { "tags": [] }, "execution_count": 16 } ] }, { "cell_type": "code", "metadata": { "id": "u2aqtENrpxZ6", "colab": { "base_uri": "https://localhost:8080/", "height": 111 }, "outputId": "71e748ab-4553-4884-bc52-b3a0fdabbf84" }, "source": [ "pysqldf('select Model, count(*) as N from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\" group by Model').head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelN
0Illumina HiSeq 250060
1Illumina NovaSeq 60001426
\n", "
" ], "text/plain": [ " Model N\n", "0 Illumina HiSeq 2500 60\n", "1 Illumina NovaSeq 6000 1426" ] }, "metadata": { "tags": [] }, "execution_count": 17 } ] }, { "cell_type": "code", "metadata": { "id": "bteaPrVSqbFR", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "93f3822c-408b-484e-e54f-d37b9f48b0e2" }, "source": [ "pysqldf('select SampleName, count(*) as N from ncbi_il_pe_nonAmp where BioProject = \"PRJNA622837\" group by SampleName order by N desc').head(100)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SampleNameN
0MA_MGH_000016
1MA_MGH_000026
2MA_MGH_000036
3MA_MGH_000046
4MA_MGH_000056
5MA_MGH_000066
6MA_MGH_000076
7MA_MGH_000086
8MA_MGH_000096
9MA_MGH_000106
10MA_DPH_000011
11MA_DPH_000021
12MA_DPH_000031
13MA_DPH_000041
14MA_DPH_000051
15MA_DPH_000061
16MA_DPH_000071
17MA_DPH_000081
18MA_DPH_000091
19MA_DPH_000101
20MA_DPH_000111
21MA_DPH_000121
22MA_DPH_000131
23MA_DPH_000141
24MA_DPH_000151
25MA_DPH_000161
26MA_DPH_000171
27MA_DPH_000181
28MA_DPH_000191
29MA_DPH_000201
30MA_DPH_000211
31MA_DPH_000221
32MA_DPH_000231
33MA_DPH_000241
34MA_DPH_000251
35MA_DPH_000261
36MA_DPH_000271
37MA_DPH_000281
38MA_DPH_000291
39MA_DPH_000301
40MA_DPH_000311
41MA_DPH_000321
42MA_DPH_000331
43MA_DPH_000341
44MA_DPH_000351
45MA_DPH_000361
46MA_DPH_000371
47MA_DPH_000381
48MA_DPH_000391
49MA_DPH_000401
50MA_DPH_000411
51MA_DPH_000421
52MA_DPH_000431
53MA_DPH_000441
54MA_DPH_000451
55MA_DPH_000461
56MA_DPH_000471
57MA_DPH_000481
58MA_DPH_000491
59MA_DPH_000501
60MA_DPH_000511
61MA_DPH_000521
62MA_DPH_000531
63MA_DPH_000541
64MA_DPH_000551
65MA_DPH_000561
66MA_DPH_000571
67MA_DPH_000581
68MA_DPH_000601
69MA_DPH_000611
70MA_DPH_000621
71MA_DPH_000631
72MA_DPH_000641
73MA_DPH_000651
74MA_DPH_000661
75MA_DPH_000671
76MA_DPH_000681
77MA_DPH_000691
78MA_DPH_000711
79MA_DPH_000721
80MA_DPH_000731
81MA_DPH_000741
82MA_DPH_000751
83MA_DPH_000761
84MA_DPH_000771
85MA_DPH_000781
86MA_DPH_000791
87MA_DPH_000801
88MA_DPH_000811
89MA_DPH_000821
90MA_DPH_000831
91MA_DPH_000841
92MA_DPH_000851
93MA_DPH_000861
94MA_DPH_000871
95MA_DPH_000891
96MA_DPH_000901
97MA_DPH_000911
98MA_DPH_000921
99MA_DPH_000931
\n", "
" ], "text/plain": [ " SampleName N\n", "0 MA_MGH_00001 6\n", "1 MA_MGH_00002 6\n", "2 MA_MGH_00003 6\n", "3 MA_MGH_00004 6\n", "4 MA_MGH_00005 6\n", "5 MA_MGH_00006 6\n", "6 MA_MGH_00007 6\n", "7 MA_MGH_00008 6\n", "8 MA_MGH_00009 6\n", "9 MA_MGH_00010 6\n", "10 MA_DPH_00001 1\n", "11 MA_DPH_00002 1\n", "12 MA_DPH_00003 1\n", "13 MA_DPH_00004 1\n", "14 MA_DPH_00005 1\n", "15 MA_DPH_00006 1\n", "16 MA_DPH_00007 1\n", "17 MA_DPH_00008 1\n", "18 MA_DPH_00009 1\n", "19 MA_DPH_00010 1\n", "20 MA_DPH_00011 1\n", "21 MA_DPH_00012 1\n", "22 MA_DPH_00013 1\n", "23 MA_DPH_00014 1\n", "24 MA_DPH_00015 1\n", "25 MA_DPH_00016 1\n", "26 MA_DPH_00017 1\n", "27 MA_DPH_00018 1\n", "28 MA_DPH_00019 1\n", "29 MA_DPH_00020 1\n", "30 MA_DPH_00021 1\n", "31 MA_DPH_00022 1\n", "32 MA_DPH_00023 1\n", "33 MA_DPH_00024 1\n", "34 MA_DPH_00025 1\n", "35 MA_DPH_00026 1\n", "36 MA_DPH_00027 1\n", "37 MA_DPH_00028 1\n", "38 MA_DPH_00029 1\n", "39 MA_DPH_00030 1\n", "40 MA_DPH_00031 1\n", "41 MA_DPH_00032 1\n", "42 MA_DPH_00033 1\n", "43 MA_DPH_00034 1\n", "44 MA_DPH_00035 1\n", "45 MA_DPH_00036 1\n", "46 MA_DPH_00037 1\n", "47 MA_DPH_00038 1\n", "48 MA_DPH_00039 1\n", "49 MA_DPH_00040 1\n", "50 MA_DPH_00041 1\n", "51 MA_DPH_00042 1\n", "52 MA_DPH_00043 1\n", "53 MA_DPH_00044 1\n", "54 MA_DPH_00045 1\n", "55 MA_DPH_00046 1\n", "56 MA_DPH_00047 1\n", "57 MA_DPH_00048 1\n", "58 MA_DPH_00049 1\n", "59 MA_DPH_00050 1\n", "60 MA_DPH_00051 1\n", "61 MA_DPH_00052 1\n", "62 MA_DPH_00053 1\n", "63 MA_DPH_00054 1\n", "64 MA_DPH_00055 1\n", "65 MA_DPH_00056 1\n", "66 MA_DPH_00057 1\n", "67 MA_DPH_00058 1\n", "68 MA_DPH_00060 1\n", "69 MA_DPH_00061 1\n", "70 MA_DPH_00062 1\n", "71 MA_DPH_00063 1\n", "72 MA_DPH_00064 1\n", "73 MA_DPH_00065 1\n", "74 MA_DPH_00066 1\n", "75 MA_DPH_00067 1\n", "76 MA_DPH_00068 1\n", "77 MA_DPH_00069 1\n", "78 MA_DPH_00071 1\n", "79 MA_DPH_00072 1\n", "80 MA_DPH_00073 1\n", "81 MA_DPH_00074 1\n", "82 MA_DPH_00075 1\n", "83 MA_DPH_00076 1\n", "84 MA_DPH_00077 1\n", "85 MA_DPH_00078 1\n", "86 MA_DPH_00079 1\n", "87 MA_DPH_00080 1\n", "88 MA_DPH_00081 1\n", "89 MA_DPH_00082 1\n", "90 MA_DPH_00083 1\n", "91 MA_DPH_00084 1\n", "92 MA_DPH_00085 1\n", "93 MA_DPH_00086 1\n", "94 MA_DPH_00087 1\n", "95 MA_DPH_00089 1\n", "96 MA_DPH_00090 1\n", "97 MA_DPH_00091 1\n", "98 MA_DPH_00092 1\n", "99 MA_DPH_00093 1" ] }, "metadata": { "tags": [] }, "execution_count": 20 } ] }, { "cell_type": "code", "metadata": { "id": "c_xq0nKcqnxK", "colab": { "base_uri": "https://localhost:8080/", "height": 663 }, "outputId": "334cdf3d-10d9-46af-a9d8-36082de9a845" }, "source": [ "pysqldf('select * from ncbi_il_pe_nonAmp where SampleName=\"MA_MGH_00001\"')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RunReleaseDateLoadDatespotsbasesspots_with_matesavgLengthsize_MBAssemblyNamedownload_pathExperimentLibraryNameLibraryStrategyLibrarySelectionLibrarySourceLibraryLayoutInsertSizeInsertDevPlatformModelSRAStudyBioProjectStudy_Pubmed_idProjectIDSampleBioSampleSampleTypeTaxIDScientificNameSampleNameg1k_pop_codesourceg1k_analysis_groupSubject_IDSexDiseaseTumorAffection_StatusAnalyte_TypeHistological_TypeBody_SiteCenterNameSubmissiondbgap_study_accessionConsentRunHashReadHash
0SRR119540592020-06-08 14:49:462020-06-08 14:40:42429698679738429692024Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8498570SAMN14938611_xGen_3_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625NonepublicAB5FB1C49D30050D7F36A91653D8D0DF44AC9D5B9AB98C13C71163EBB4ECD7AD
1SRR119541642020-06-08 14:52:392020-06-08 14:41:487398714945374739872027Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8498465SAMN14938611_xGen_2_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625Nonepublic7BA7FA8B66CAA6DEEEBB8C11712C96B3B437DB6AB394C49E0D4C79CF5F153CE6
2SRR119538122020-06-08 14:49:462020-06-08 14:38:153008236076624630082320229Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8498307SAMN14938611_Next_2_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625NonepublicE0168B308E9FF193D4C850E01BD038EAAB695CA8BBA75744D024DAFAD436F164
3SRR119542812020-06-08 14:52:372020-06-08 14:42:544761689618593647616820242Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8498349SAMN14938611_Next_3_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625NonepublicB8DACF8D30B342B6626A11336FAFB5742199CC4352E87B7D164769903E1593D9
4SRR119537582020-06-08 14:40:572020-06-08 14:37:212274604594692022746020220Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8497981SAMN14938611_Next_1_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625NonepublicE49F808F6B2715D0804D83AE1BFAFE6E0E59993DF940CEF7B15ACD859BDAAF46
5SRR119537792020-06-08 14:49:422020-06-08 14:37:401003682027433610036820210Nonehttps://sra-download.ncbi.nlm.nih.gov/traces/s...SRX8497960SAMN14938611_xGen_1_ERCC-41RNA-SeqcDNAVIRAL RNAPAIRED00ILLUMINAIllumina HiSeq 2500SRP266465PRJNA622837None622837SRS6796757SAMN14938611simple2697049Severe acute respiratory syndrome coronavirus 2MA_MGH_00001NoneNoneNoneNoneNoneNonenoNoneNoneNoneNoneBROAD INSTITUTE OF HARVARD AND MITSRA1084625NonepublicAD83F378D559074CA78EE17C95AA00F0B0D6D7BD7174939AF6B7D0AE76A1C93A
\n", "
" ], "text/plain": [ " Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n", "0 SRR11954059 2020-06-08 14:49:46 2020-06-08 14:40:42 42969 8679738 42969 202 4 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498570 SAMN14938611_xGen_3_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public AB5FB1C49D30050D7F36A91653D8D0DF 44AC9D5B9AB98C13C71163EBB4ECD7AD\n", "1 SRR11954164 2020-06-08 14:52:39 2020-06-08 14:41:48 73987 14945374 73987 202 7 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498465 SAMN14938611_xGen_2_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public 7BA7FA8B66CAA6DEEEBB8C11712C96B3 B437DB6AB394C49E0D4C79CF5F153CE6\n", "2 SRR11953812 2020-06-08 14:49:46 2020-06-08 14:38:15 300823 60766246 300823 202 29 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498307 SAMN14938611_Next_2_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public E0168B308E9FF193D4C850E01BD038EA AB695CA8BBA75744D024DAFAD436F164\n", "3 SRR11954281 2020-06-08 14:52:37 2020-06-08 14:42:54 476168 96185936 476168 202 42 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8498349 SAMN14938611_Next_3_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public B8DACF8D30B342B6626A11336FAFB574 2199CC4352E87B7D164769903E1593D9\n", "4 SRR11953758 2020-06-08 14:40:57 2020-06-08 14:37:21 227460 45946920 227460 202 20 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8497981 SAMN14938611_Next_1_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public E49F808F6B2715D0804D83AE1BFAFE6E 0E59993DF940CEF7B15ACD859BDAAF46\n", "5 SRR11953779 2020-06-08 14:49:42 2020-06-08 14:37:40 100368 20274336 100368 202 10 None https://sra-download.ncbi.nlm.nih.gov/traces/s... SRX8497960 SAMN14938611_xGen_1_ERCC-41 RNA-Seq cDNA VIRAL RNA PAIRED 0 0 ILLUMINA Illumina HiSeq 2500 SRP266465 PRJNA622837 None 622837 SRS6796757 SAMN14938611 simple 2697049 Severe acute respiratory syndrome coronavirus 2 MA_MGH_00001 None None None None None None no None None None None BROAD INSTITUTE OF HARVARD AND MIT SRA1084625 None public AD83F378D559074CA78EE17C95AA00F0 B0D6D7BD7174939AF6B7D0AE76A1C93A" ] }, "metadata": { "tags": [] }, "execution_count": 21 } ] }, { "cell_type": "code", "metadata": { "id": "rjGKrzWaq6CV", "colab": { "base_uri": "https://localhost:8080/", "height": 513 }, "outputId": "73df4eca-f01f-4ccd-e9fa-9c4711aaf56e" }, "source": [ "ncbi.head()\n" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RunReleaseDateLoadDatespotsbasesspots_with_matesavgLengthsize_MBAssemblyNamedownload_pathExperimentLibraryNameLibraryStrategyLibrarySelectionLibrarySourceLibraryLayoutInsertSizeInsertDevPlatformModelSRAStudyBioProjectStudy_Pubmed_idProjectIDSampleBioSampleSampleTypeTaxIDScientificNameSampleNameg1k_pop_codesourceg1k_analysis_groupSubject_IDSexDiseaseTumorAffection_StatusAnalyte_TypeHistological_TypeBody_SiteCenterNameSubmissiondbgap_study_accessionConsentRunHashReadHash
0ERR46945332020-10-20 15:27:30NaN00000NaNNaNERX4615619NaNAMPLICONPCRVIRAL RNASINGLE00ILLUMINAunspecifiedERP121228PRJEB37886NaN629258ERS5218687SAMEA7460507simple2697049Severe acute respiratory syndrome coronavirus 2COG-UK/ALDP-9E7BBANaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNPUBLIC HEALTH ENGLAND (COLINDALE)ERA3005974NaNpublicNaNNaN
1ERR46946042020-10-20 15:36:08NaN00000NaNNaNERX4615689NaNAMPLICONPCRVIRAL RNASINGLE00ILLUMINAunspecifiedERP121228PRJEB37886NaN629258ERS5218756SAMEA7460576simple2697049Severe acute respiratory syndrome coronavirus 2COG-UK/ALDP-9E85D9NaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNPUBLIC HEALTH ENGLAND (COLINDALE)ERA3006200NaNpublicNaNNaN
2ERR46945862020-10-20 15:27:31NaN00000NaNNaNERX4615671NaNAMPLICONPCRVIRAL RNASINGLE00ILLUMINAunspecifiedERP121228PRJEB37886NaN629258ERS5218738SAMEA7460558simple2697049Severe acute respiratory syndrome coronavirus 2COG-UK/ALDP-9E869ANaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNPUBLIC HEALTH ENGLAND (COLINDALE)ERA3006136NaNpublicNaNNaN
3ERR46945932020-10-20 15:27:31NaN00000NaNNaNERX4615678NaNAMPLICONPCRVIRAL RNASINGLE00ILLUMINAunspecifiedERP121228PRJEB37886NaN629258ERS5218745SAMEA7460565simple2697049Severe acute respiratory syndrome coronavirus 2COG-UK/ALDP-9E86E5NaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNPUBLIC HEALTH ENGLAND (COLINDALE)ERA3006159NaNpublicNaNNaN
4ERR46946012020-10-20 15:36:08NaN00000NaNNaNERX4615686NaNAMPLICONPCRVIRAL RNASINGLE00OXFORD_NANOPOREGridIONERP121228PRJEB37886NaN629258ERS5218753SAMEA7460573simple2697049Severe acute respiratory syndrome coronavirus 2COG-UK/ALDP-9E8D3BNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNCENTRE FOR ENZYME INNOVATION, UNIVERSITY OF PO...ERA3006185NaNpublicNaNNaN
\n", "
" ], "text/plain": [ " Run ReleaseDate LoadDate spots bases spots_with_mates avgLength size_MB AssemblyName download_path Experiment LibraryName LibraryStrategy LibrarySelection LibrarySource LibraryLayout InsertSize InsertDev Platform Model SRAStudy BioProject Study_Pubmed_id ProjectID Sample BioSample SampleType TaxID ScientificName SampleName g1k_pop_code source g1k_analysis_group Subject_ID Sex Disease Tumor Affection_Status Analyte_Type Histological_Type Body_Site CenterName Submission dbgap_study_accession Consent RunHash ReadHash\n", "0 ERR4694533 2020-10-20 15:27:30 NaN 0 0 0 0 0 NaN NaN ERX4615619 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218687 SAMEA7460507 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E7BBA NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3005974 NaN public NaN NaN\n", "1 ERR4694604 2020-10-20 15:36:08 NaN 0 0 0 0 0 NaN NaN ERX4615689 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218756 SAMEA7460576 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E85D9 NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006200 NaN public NaN NaN\n", "2 ERR4694586 2020-10-20 15:27:31 NaN 0 0 0 0 0 NaN NaN ERX4615671 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218738 SAMEA7460558 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E869A NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006136 NaN public NaN NaN\n", "3 ERR4694593 2020-10-20 15:27:31 NaN 0 0 0 0 0 NaN NaN ERX4615678 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 ILLUMINA unspecified ERP121228 PRJEB37886 NaN 629258 ERS5218745 SAMEA7460565 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E86E5 NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN PUBLIC HEALTH ENGLAND (COLINDALE) ERA3006159 NaN public NaN NaN\n", "4 ERR4694601 2020-10-20 15:36:08 NaN 0 0 0 0 0 NaN NaN ERX4615686 NaN AMPLICON PCR VIRAL RNA SINGLE 0 0 OXFORD_NANOPORE GridION ERP121228 PRJEB37886 NaN 629258 ERS5218753 SAMEA7460573 simple 2697049 Severe acute respiratory syndrome coronavirus 2 COG-UK/ALDP-9E8D3B NaN NaN NaN NaN NaN NaN no NaN NaN NaN NaN CENTRE FOR ENZYME INNOVATION, UNIVERSITY OF PO... ERA3006185 NaN public NaN NaN" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "code", "metadata": { "id": "rjIsbCMWb4GV", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "outputId": "fdb953b0-3c36-43a9-c699-6dc039a21266" }, "source": [ "pysqldf('select BioProject, count(*) as N from ncbi where Platform=\"ILLUMINA\" group by BioProject order by N')" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BioProjectN
0PRJEB384591
1PRJEB397371
2PRJNA6086511
3PRJNA6236831
4PRJNA6237971
5PRJNA6238951
6PRJNA6242311
7PRJNA6256691
8PRJNA6265261
9PRJNA6332411
10PRJNA6350171
11PRJNA6360041
12PRJNA6378921
13PRJNA6470571
14PRJNA6579381
15PRJNA6579851
16PRJNA6582111
17PRJNA6582421
18PRJNA6661891
19PRJNA2312212
20PRJNA6243582
21PRJNA6247922
22PRJNA6298912
23PRJNA6307162
24PRJNA6435742
25PRJNA6578932
26PRJNA6677982
27PRJNA6695532
28PRJNA6702222
29PRJNA6372853
30PRJNA6443573
31PRJNA6079484
32PRJNA6161474
33PRJNA6279774
34PRJNA6341944
35PRJEB396325
36PRJNA6280435
37PRJNA6395915
38PRJEB381016
39PRJEB383516
40PRJNA6153196
41PRJNA6273546
42PRJNA6474487
43PRJNA6686317
44PRJNA6059078
45PRJNA6059839
46PRJNA6625899
47PRJNA61644610
48PRJNA63821110
49PRJNA66654310
50PRJNA62722911
51PRJNA63644611
52PRJNA66669612
53PRJNA66340214
54PRJNA64505117
55PRJEB3836918
56PRJNA62281718
57PRJNA64504821
58PRJNA65013422
59PRJNA65929323
60PRJNA61454624
61PRJNA64505230
62PRJNA66268433
63PRJNA62988943
64PRJNA63986446
65PRJNA63247547
66PRJNA63104249
67PRJNA63411949
68PRJEB3976150
69PRJEB3854652
70PRJNA64830656
71PRJNA65003760
72PRJNA63435667
73PRJNA66621969
74PRJNA64910172
75PRJNA64357585
76PRJNA667180102
77PRJNA627662112
78PRJNA639956164
79PRJNA656695171
80PRJNA633948204
81PRJNA656534209
82PRJNA647529212
83PRJEB37513244
84PRJNA631061254
85PRJEB40443346
86PRJNA662193400
87PRJEB39887468
88PRJNA636748516
89PRJEB38723542
90PRJNA639066696
91PRJNA650245864
92PRJNA625551904
93PRJNA612578964
94PRJNA6104281066
95PRJNA6459061322
96PRJNA6228371486
97PRJNA6555771536
98PRJNA6149951983
99PRJNA61395810834
100PRJEB3788656281
\n", "
" ], "text/plain": [ " BioProject N\n", "0 PRJEB38459 1\n", "1 PRJEB39737 1\n", "2 PRJNA608651 1\n", "3 PRJNA623683 1\n", "4 PRJNA623797 1\n", "5 PRJNA623895 1\n", "6 PRJNA624231 1\n", "7 PRJNA625669 1\n", "8 PRJNA626526 1\n", "9 PRJNA633241 1\n", "10 PRJNA635017 1\n", "11 PRJNA636004 1\n", "12 PRJNA637892 1\n", "13 PRJNA647057 1\n", "14 PRJNA657938 1\n", "15 PRJNA657985 1\n", "16 PRJNA658211 1\n", "17 PRJNA658242 1\n", "18 PRJNA666189 1\n", "19 PRJNA231221 2\n", "20 PRJNA624358 2\n", "21 PRJNA624792 2\n", "22 PRJNA629891 2\n", "23 PRJNA630716 2\n", "24 PRJNA643574 2\n", "25 PRJNA657893 2\n", "26 PRJNA667798 2\n", "27 PRJNA669553 2\n", "28 PRJNA670222 2\n", "29 PRJNA637285 3\n", "30 PRJNA644357 3\n", "31 PRJNA607948 4\n", "32 PRJNA616147 4\n", "33 PRJNA627977 4\n", "34 PRJNA634194 4\n", "35 PRJEB39632 5\n", "36 PRJNA628043 5\n", "37 PRJNA639591 5\n", "38 PRJEB38101 6\n", "39 PRJEB38351 6\n", "40 PRJNA615319 6\n", "41 PRJNA627354 6\n", "42 PRJNA647448 7\n", "43 PRJNA668631 7\n", "44 PRJNA605907 8\n", "45 PRJNA605983 9\n", "46 PRJNA662589 9\n", "47 PRJNA616446 10\n", "48 PRJNA638211 10\n", "49 PRJNA666543 10\n", "50 PRJNA627229 11\n", "51 PRJNA636446 11\n", "52 PRJNA666696 12\n", "53 PRJNA663402 14\n", "54 PRJNA645051 17\n", "55 PRJEB38369 18\n", "56 PRJNA622817 18\n", "57 PRJNA645048 21\n", "58 PRJNA650134 22\n", "59 PRJNA659293 23\n", "60 PRJNA614546 24\n", "61 PRJNA645052 30\n", "62 PRJNA662684 33\n", "63 PRJNA629889 43\n", "64 PRJNA639864 46\n", "65 PRJNA632475 47\n", "66 PRJNA631042 49\n", "67 PRJNA634119 49\n", "68 PRJEB39761 50\n", "69 PRJEB38546 52\n", "70 PRJNA648306 56\n", "71 PRJNA650037 60\n", "72 PRJNA634356 67\n", "73 PRJNA666219 69\n", "74 PRJNA649101 72\n", "75 PRJNA643575 85\n", "76 PRJNA667180 102\n", "77 PRJNA627662 112\n", "78 PRJNA639956 164\n", "79 PRJNA656695 171\n", "80 PRJNA633948 204\n", "81 PRJNA656534 209\n", "82 PRJNA647529 212\n", "83 PRJEB37513 244\n", "84 PRJNA631061 254\n", "85 PRJEB40443 346\n", "86 PRJNA662193 400\n", "87 PRJEB39887 468\n", "88 PRJNA636748 516\n", "89 PRJEB38723 542\n", "90 PRJNA639066 696\n", "91 PRJNA650245 864\n", "92 PRJNA625551 904\n", "93 PRJNA612578 964\n", "94 PRJNA610428 1066\n", "95 PRJNA645906 1322\n", "96 PRJNA622837 1486\n", "97 PRJNA655577 1536\n", "98 PRJNA614995 1983\n", "99 PRJNA613958 10834\n", "100 PRJEB37886 56281" ] }, "metadata": { "tags": [] }, "execution_count": 11 } ] }, { "cell_type": "code", "metadata": { "id": "KuHE7HaQR7OA" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }