{ "metadata": { "name": "", "signature": "sha256:991360c45d4dcdf7afc02d5c21ea9172cca1c38a7a5049cb029c0caffcc734bf" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Tools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In developing qdod there are several handy tools for aggretating data. Here we will try to list them." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Downloading data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Section on how to get data automatically down from different resources including \n", "\n", "- NCBI \n", "- Ensembl \n", "- Sigenae \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sqlshare** \n", "`!python {spd}fetchdata.py -s \"SELECT ProbeName,GB_ACC,Column2 as sequence,Description FROM [sr320@washington.edu].[GPL11353_array]ar left join [sr320@washington.edu].[Cgigas_EST__Nuc_NCBI_040414_cl]est on ar.GB_ACC=est.Column5\" -f tsv -o /Volumes/web/cnidarian/GPL11353_fasta2.tab`\n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Uploading data" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "#uploading to SQLSHARE\n", "spd=\"/Users/sr320/sqlshare-pythonclient/tools/\"\n", "!python {spd}singleupload.py -d Cgigas_EST_NCBI_040414_cl /Volumes/web/cnidarian/Cgigas_EST_NCBI_040414_cl.tab" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Manipulating " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Fasta to tab**\n", "```\n", "!perl -e '$count=0; $len=0; while(<>) {s/\\r?\\n//; s/\\t/ /g; if (s/^>//) { if ($. != 1) {print \"\\n\"} s/ |$/\\t/; $count++; $_ .= \"\\t\";} else {s/ //g; $len += length($_)} print $_;} print \"\\n\"; warn \"\\nConverted $count FASTA records in $. lines to tabular format\\nTotal sequence length: $len\\n\\n\";' /Volumes/web/cnidarian/oyster.v9.fa > /Volumes/web/cnidarian/cgigas_v9_genome01.tab\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Tab to fasta**\n", "`!awk -F \",\" '{print \">\"$1\"\\n\"$2}' /Volumes/web/cnidarian/GPL11353_v6fasta.csv > /Volumes/web/cnidarian/GPL11353_v6fasta.fa`" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }