{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ipyrad-analysis toolkit: sratools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For reproducibility purposes, it is nice to be able to download the raw data for your analysis from an online repository like NCBI with a simple script at the top of your notebook. We've written a simple wrapper for the sratools command line program (which is notoriously difficult to use and poorly documented) to try to make this easier to do. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required software" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad -c bioconda \n", "# conda install sratools -c bioconda" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fetch info for a published data set by its accession ID\n", "You can find the study ID or individual sample IDs from published papers or by searching the NCBI or related databases. ipyrad can take as input one or more accessions IDs for individual Runs or Studies (SRR or SRP, and similarly ERR or ERP, etc.). \n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# init sratools object with an accessions argument\n", "sra = ipa.sratools(accessions=\"SRP065788\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "Fetching project data..." ] } ], "source": [ "# fetch info for all samples from this study, save as a dataframe\n", "stable = sra.fetch_runinfo()\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | Run | \n", "ReleaseDate | \n", "LoadDate | \n", "spots | \n", "bases | \n", "spots_with_mates | \n", "avgLength | \n", "size_MB | \n", "AssemblyName | \n", "download_path | \n", "... | \n", "SRAStudy | \n", "BioProject | \n", "Study_Pubmed_id | \n", "ProjectID | \n", "Sample | \n", "BioSample | \n", "SampleType | \n", "TaxID | \n", "ScientificName | \n", "SampleName | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "SRR2895732 | \n", "2015-11-04 15:50:01 | \n", "2015-11-04 17:19:15 | \n", "2009174 | \n", "182834834 | \n", "0 | \n", "91 | \n", "116 | \n", "NaN | \n", "https://sra-download.ncbi.nlm.nih.gov/sos/sra-... | \n", "... | \n", "SRP065788 | \n", "PRJNA299402 | \n", "NaN | \n", "299402 | \n", "SRS1146158 | \n", "SAMN04202163 | \n", "simple | \n", "224736 | \n", "Viburnum betulifolium | \n", "Lib1_betulifolium | \n", "
1 | \n", "SRR2895743 | \n", "2015-11-04 15:50:01 | \n", "2015-11-04 17:18:35 | \n", "2452970 | \n", "223220270 | \n", "0 | \n", "91 | \n", "140 | \n", "NaN | \n", "https://sra-download.ncbi.nlm.nih.gov/sos/sra-... | \n", "... | \n", "SRP065788 | \n", "PRJNA299402 | \n", "NaN | \n", "299402 | \n", "SRS1146171 | \n", "SAMN04202164 | \n", "simple | \n", "1220044 | \n", "Viburnum bitchiuense | \n", "Lib1_bitchiuense_combined | \n", "
2 | \n", "SRR2895755 | \n", "2015-11-04 15:50:01 | \n", "2015-11-04 17:18:46 | \n", "4640732 | \n", "422306612 | \n", "0 | \n", "91 | \n", "264 | \n", "NaN | \n", "https://sra-download.ncbi.nlm.nih.gov/sos/sra-... | \n", "... | \n", "SRP065788 | \n", "PRJNA299402 | \n", "NaN | \n", "299402 | \n", "SRS1146182 | \n", "SAMN04202165 | \n", "simple | \n", "237927 | \n", "Viburnum carlesii | \n", "Lib1_carlesii_D1_BP_001 | \n", "
3 | \n", "SRR2895756 | \n", "2015-11-04 15:50:01 | \n", "2015-11-04 17:20:18 | \n", "3719383 | \n", "338463853 | \n", "0 | \n", "91 | \n", "214 | \n", "NaN | \n", "https://sra-download.ncbi.nlm.nih.gov/sos/sra-... | \n", "... | \n", "SRP065788 | \n", "PRJNA299402 | \n", "NaN | \n", "299402 | \n", "SRS1146183 | \n", "SAMN04202166 | \n", "simple | \n", "237928 | \n", "Viburnum cinnamomifolium | \n", "Lib1_cinnamomifolium_PWS2105X | \n", "
4 | \n", "SRR2895757 | \n", "2015-11-04 15:50:01 | \n", "2015-11-04 17:20:06 | \n", "3745852 | \n", "340872532 | \n", "0 | \n", "91 | \n", "213 | \n", "NaN | \n", "https://sra-download.ncbi.nlm.nih.gov/sos/sra-... | \n", "... | \n", "SRP065788 | \n", "PRJNA299402 | \n", "NaN | \n", "299402 | \n", "SRS1146181 | \n", "SAMN04202167 | \n", "simple | \n", "237929 | \n", "Viburnum clemensae | \n", "Lib1_clemensiae_DRY6_PWS_2135 | \n", "
5 rows × 30 columns
\n", "\n", " | Run | \n", "ScientificName | \n", "SampleName | \n", "
---|---|---|---|
0 | \n", "SRR2895732 | \n", "Viburnum betulifolium | \n", "Lib1_betulifolium | \n", "
1 | \n", "SRR2895743 | \n", "Viburnum bitchiuense | \n", "Lib1_bitchiuense_combined | \n", "
2 | \n", "SRR2895755 | \n", "Viburnum carlesii | \n", "Lib1_carlesii_D1_BP_001 | \n", "
3 | \n", "SRR2895756 | \n", "Viburnum cinnamomifolium | \n", "Lib1_cinnamomifolium_PWS2105X | \n", "
4 | \n", "SRR2895757 | \n", "Viburnum clemensae | \n", "Lib1_clemensiae_DRY6_PWS_2135 | \n", "