{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TODAY'S DATE:\n", "Wed Aug 22 14:10:02 PDT 2018\n", "------------\n", "\n", "Distributor ID:\tUbuntu\n", "Description:\tUbuntu 16.04.5 LTS\n", "Release:\t16.04\n", "Codename:\txenial\n", "\n", "------------\n", "HOSTNAME: \n", "roadrunner\n", "\n", "------------\n", "Computer Specs:\n", "\n", "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 16\n", "On-line CPU(s) list: 0-15\n", "Thread(s) per core: 2\n", "Core(s) per socket: 4\n", "Socket(s): 2\n", "NUMA node(s): 1\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 26\n", "Model name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz\n", "Stepping: 5\n", "CPU MHz: 1596.000\n", "CPU max MHz: 2394.0000\n", "CPU min MHz: 1596.0000\n", "BogoMIPS: 4521.80\n", "Virtualization: VT-x\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 256K\n", "L3 cache: 8192K\n", "NUMA node0 CPU(s): 0-15\n", "Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm kaiser tpr_shadow vnmi flexpriority ept vpid dtherm ida\n", "\n", "------------\n", "\n", "Memory Specs\n", "\n", " total used free shared buff/cache available\n", "Mem: 47G 1.1G 44G 78M 1.3G 45G\n", "Swap: 47G 0B 47G\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "No LSB modules are available.\n" ] } ], "source": [ "%%bash\n", "echo \"TODAY'S DATE:\"\n", "date\n", "echo \"------------\"\n", "echo \"\"\n", "#Display operating system info\n", "lsb_release -a\n", "echo \"\"\n", "echo \"------------\"\n", "echo \"HOSTNAME: \"; hostname \n", "echo \"\"\n", "echo \"------------\"\n", "echo \"Computer Specs:\"\n", "echo \"\"\n", "lscpu\n", "echo \"\"\n", "echo \"------------\"\n", "echo \"\"\n", "echo \"Memory Specs\"\n", "echo \"\"\n", "free -mh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create necessary directories" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mkdir /home/sam/data/Cvirginica_genome\n", "mkdir /home/sam/analyses/20180822_virginica_repeatmasker_all\n", "mkdir /home/sam/analyses/20180822_virginica_repeatmasker_Cgigas\n", "mkdir /home/sam/analyses/20180822_virginica_repeatmasker_Cvirginica\n", "mkdir /home/sam/analyses/20180822_virginica_repeatmasker_defaults" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Download _Crassostrea virginica_ genome FastA file\n", "\n", "Info on FastA file is here: [https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome-1](https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome-1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 662M\n", "-rw-rw-r-- 1 sam sam 662M Jun 7 14:40 Cvirginica_v300.fa\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t0m8.624s\n", "user\t0m0.504s\n", "sys\t0m3.072s\n" ] } ], "source": [ "%%bash\n", "time \\\n", "wget http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa \\\n", "--quiet \\\n", "--directory-prefix=/home/sam/data/Cvirginica_genome/\n", "\n", "ls -lh /home/sam/data/Cvirginica_genome/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Verify MD5 checksum\n", "\n", "Original MD5 checksum taken from GitHub Genomic Resource linked above.\n", "\n", "Use ```md5sum``` to generate checksum from downloaded FastA file and ```awk``` to print the first field (i.e. the checksum value). This is saved to the variable: ```dl_md5```\n", "\n", "Then, check for differences between the two variables. \n", "\n", "No output confirms no difference." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "md5=f9135e323583dc77fc726e9df2677a32\n", "dl_md5=$(md5sum /home/sam/data/Cvirginica_genome/Cvirginica_v300.fa | awk '{ print $1 }')\n", "diff <(echo \"$md5\") <(echo \"$dl_md5\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run RepeatMasker with _all_ species setting and following options:\n", "\n", "```-species \"all\"``` : Sets species to all\n", "\n", "```-par 15``` : Use 15 CPU threads\n", "\n", "```-gff``` : Create GFF output file (in addition to default files)\n", "\n", "```-excln``` : Adjusts output table calculations to exclude sequence runs of >=25Ns. Useful for draft genome assemblies.\n", "\n", "```-1>``` : Send stdout to file instead of printing to notebook.\n", "\n", "```-2>``` : Send stderr to file instead of printing to notebook.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t1806m14.965s\n", "user\t27975m51.556s\n", "sys\t137m10.360s\n", "mv: cannot move './' to '/home/sam/analyses/20180822_virginica_repeatmasker_all/.': Device or resource busy\n" ] } ], "source": [ "%%bash\n", "wd=\"/home/sam/analyses/20180822_virginica_repeatmasker_all\"\n", "cd /home/sam/data/Cvirginica_genome/\n", "time \\\n", "/home/shared/RepeatMasker-4.0.7/RepeatMasker \\\n", "/home/sam/data/Cvirginica_genome/Cvirginica_v300.fa \\\n", "-species \"all\" \\\n", "-par 15 \\\n", "-gff \\\n", "-excln \\\n", "1> \"$wd\"/stdout.out \\\n", "2> \"$wd\"/stderr.err\n", "find ./ -not -name Cvirginica_v300.fa -exec mv '{}' \"$wd\" \\;\n", "sed '/^Subject:/ s/ / repeatmasker_virginica_all JOB COMPLETE/' ~/.default-subject.mail | msmtp \"$EMAIL\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/data/Cvirginica_genome/" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/analyses/20180822_virginica_repeatmasker_all/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run RepeatMasker with _crassostrea gigas_ species setting and following options:\n", "\n", "```-species \"crassostrea gigas\"``` : Sets species to _Crassostrea gigas_\n", "\n", "```-par 15``` : Use 15 CPU threads\n", "\n", "```-gff``` : Create GFF output file (in addition to default files)\n", "\n", "```-excln``` : Adjusts output table calculations to exclude sequence runs of >=25Ns. Useful for draft genome assemblies.\n", "\n", "```-1>``` : Send stdout to file instead of printing to notebook.\n", "\n", "```-2>``` : Send stderr to file instead of printing to notebook.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t161m24.668s\n", "user\t1801m15.348s\n", "sys\t125m13.016s\n", "mv: cannot move './' to '/home/sam/analyses/20180822_virginica_repeatmasker_Cgigas/.': Device or resource busy\n" ] } ], "source": [ "%%bash\n", "wd=\"/home/sam/analyses/20180822_virginica_repeatmasker_Cgigas\"\n", "cd /home/sam/data/Cvirginica_genome/\n", "time \\\n", "/home/shared/RepeatMasker-4.0.7/RepeatMasker \\\n", "/home/sam/data/Cvirginica_genome/Cvirginica_v300.fa \\\n", "-species \"crassostrea gigas\" \\\n", "-par 15 \\\n", "-gff \\\n", "-excln \\\n", "1> \"$wd\"/stdout.out \\\n", "2> \"$wd\"/stderr.err\n", "find ./ -not -name Cvirginica_v300.fa -exec mv '{}' \"$wd\" \\;\n", "sed '/^Subject:/ s/ / repeatmasker_virginica_gigas JOB COMPLETE/' ~/.default-subject.mail | msmtp \"$EMAIL\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/data/Cvirginica_genome/" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/analyses/20180822_virginica_repeatmasker_Cgigas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run RepeatMasker with _crassostrea virginica_ species setting and following options:\n", "\n", "```-species \"crassostrea virginica\"``` : Sets species to _Crassostrea virginica_\n", "\n", "```-par 15``` : Use 15 CPU threads\n", "\n", "```-gff``` : Create GFF output file (in addition to default files)\n", "\n", "```-excln``` : Adjusts output table calculations to exclude sequence runs of >=25Ns. Useful for draft genome assemblies.\n", "\n", "```-1>``` : Send stdout to file instead of printing to notebook.\n", "\n", "```-2>``` : Send stderr to file instead of printing to notebook." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t88m45.684s\n", "user\t911m23.604s\n", "sys\t139m47.128s\n", "mv: cannot move './' to '/home/sam/analyses/20180822_virginica_repeatmasker_Cvirginica/.': Device or resource busy\n" ] } ], "source": [ "%%bash\n", "wd=\"/home/sam/analyses/20180822_virginica_repeatmasker_Cvirginica\"\n", "cd /home/sam/data/Cvirginica_genome/\n", "time \\\n", "/home/shared/RepeatMasker-4.0.7/RepeatMasker \\\n", "/home/sam/data/Cvirginica_genome/Cvirginica_v300.fa \\\n", "-species \"crassostrea virginica\" \\\n", "-par 15 \\\n", "-gff \\\n", "-excln \\\n", "1> \"$wd\"/stdout.out \\\n", "2> \"$wd\"/stderr.err\n", "find ./ -not -name Cvirginica_v300.fa -exec mv '{}' \"$wd\" \\;\n", "sed '/^Subject:/ s/ / repeatmasker_virginica_virginica JOB COMPLETE/' ~/.default-subject.mail | msmtp \"$EMAIL\"" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/data/Cvirginica_genome/" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/analyses/20180822_virginica_repeatmasker_Cvirginica" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run RepeatMasker with _defaults_ species setting and following options:\n", "\n", "```-par 15``` : Use 15 CPU threads\n", "\n", "```-gff``` : Create GFF output file (in addition to default files)\n", "\n", "```-excln``` : Adjusts output table calculations to exclude sequence runs of >=25Ns. Useful for draft genome assemblies.\n", "\n", "```-1>``` : Send stdout to file instead of printing to notebook.\n", "\n", "```-2>``` : Send stderr to file instead of printing to notebook." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t148m33.929s\n", "user\t1883m54.940s\n", "sys\t175m28.320s\n", "mv: cannot move './' to '/home/sam/analyses/20180822_virginica_repeatmasker_defaults/.': Device or resource busy\n" ] } ], "source": [ "%%bash\n", "wd=\"/home/sam/analyses/20180822_virginica_repeatmasker_defaults\"\n", "cd /home/sam/data/Cvirginica_genome/\n", "time \\\n", "/home/shared/RepeatMasker-4.0.7/RepeatMasker \\\n", "/home/sam/data/Cvirginica_genome/Cvirginica_v300.fa \\\n", "-par 15 \\\n", "-gff \\\n", "-excln \\\n", "1> \"$wd\"/stdout.out \\\n", "2> \"$wd\"/stderr.err\n", "find ./ -not -name Cvirginica_v300.fa -exec mv '{}' \"$wd\" \\;\n", "sed '/^Subject:/ s/ / repeatmasker_virginica_defaults JOB COMPLETE/' ~/.default-subject.mail | msmtp \"$EMAIL\"" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/data/Cvirginica_genome/" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n" ] } ], "source": [ "%%bash\n", "ls /home/sam/analyses/20180822_virginica_repeatmasker_defaults" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SUMMARY TABLE (species=all)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", "file name: Cvirginica_v300.fa \n", "sequences: 11\n", "total length: 684741128 bp (684675328 bp excl N/X-runs)\n", "GC level: 34.83 %\n", "bases masked: 113771462 bp ( 16.62 %)\n", "==================================================\n", " number of length percentage\n", " elements* occupied of sequence\n", "--------------------------------------------------\n", "Retroelements 97003 27946871 bp 4.08 %\n", " SINEs: 48145 9242559 bp 1.35 %\n", " Penelope 1429 256929 bp 0.04 %\n", " LINEs: 27022 10570154 bp 1.54 %\n", " CRE/SLACS 28 2219 bp 0.00 %\n", " L2/CR1/Rex 2160 316660 bp 0.05 %\n", " R1/LOA/Jockey 3058 386611 bp 0.06 %\n", " R2/R4/NeSL 511 226938 bp 0.03 %\n", " RTE/Bov-B 7377 3276312 bp 0.48 %\n", " L1/CIN4 1331 95476 bp 0.01 %\n", " LTR elements: 21836 8134158 bp 1.19 %\n", " BEL/Pao 1807 936488 bp 0.14 %\n", " Ty1/Copia 3046 296183 bp 0.04 %\n", " Gypsy/DIRS1 12789 6060883 bp 0.89 %\n", " Retroviral 2369 152228 bp 0.02 %\n", "\n", "DNA transposons 180693 29492426 bp 4.31 %\n", " hobo-Activator 12869 1114188 bp 0.16 %\n", " Tc1-IS630-Pogo 17233 2485049 bp 0.36 %\n", " En-Spm 0 0 bp 0.00 %\n", " MuDR-IS905 0 0 bp 0.00 %\n", " PiggyBac 2388 405926 bp 0.06 %\n", " Tourist/Harbinger 9302 992476 bp 0.14 %\n", " Other (Mirage, 238 15946 bp 0.00 %\n", " P-element, Transib)\n", "\n", "Rolling-circles 0 0 bp 0.00 %\n", "\n", "Unclassified: 137707 45460608 bp 6.64 %\n", "\n", "Total interspersed repeats: 102899905 bp 15.03 %\n", "\n", "\n", "Small RNA: 45243 9057873 bp 1.32 %\n", "\n", "Satellites: 3852 760316 bp 0.11 %\n", "Simple repeats: 203542 8946510 bp 1.31 %\n", "Low complexity: 26205 1281043 bp 0.19 %\n", "==================================================\n", "\n", "* most repeats fragmented by insertions or deletions\n", " have been counted as one element\n", " Runs of >=20 X/Ns in query were excluded in % calcs\n", "\n", "\n", "The query species was assumed to be root \n", "RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127\n", " \n", "run with rmblastn version 2.6.0+\n", "\n" ] } ], "source": [ "%%bash\n", "cat /home/sam/analyses/20180822_virginica_repeatmasker_all/Cvirginica_v300.fa.tbl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SUMMARY TABLE (species=_Crassostrea gigas_)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", "file name: Cvirginica_v300.fa \n", "sequences: 11\n", "total length: 684741128 bp (684675328 bp excl N/X-runs)\n", "GC level: 34.83 %\n", "bases masked: 93923386 bp ( 13.72 %)\n", "==================================================\n", " number of length percentage\n", " elements* occupied of sequence\n", "--------------------------------------------------\n", "Retroelements 26397 15008601 bp 2.19 %\n", " SINEs: 4 722 bp 0.00 %\n", " Penelope 675 190160 bp 0.03 %\n", " LINEs: 17645 8922188 bp 1.30 %\n", " CRE/SLACS 0 0 bp 0.00 %\n", " L2/CR1/Rex 70 39188 bp 0.01 %\n", " R1/LOA/Jockey 0 0 bp 0.00 %\n", " R2/R4/NeSL 4 5110 bp 0.00 %\n", " RTE/Bov-B 6194 2718955 bp 0.40 %\n", " L1/CIN4 0 0 bp 0.00 %\n", " LTR elements: 8748 6085691 bp 0.89 %\n", " BEL/Pao 933 788887 bp 0.12 %\n", " Ty1/Copia 47 82743 bp 0.01 %\n", " Gypsy/DIRS1 6819 4822734 bp 0.70 %\n", " Retroviral 0 0 bp 0.00 %\n", "\n", "DNA transposons 163945 26422122 bp 3.86 %\n", " hobo-Activator 7742 720623 bp 0.11 %\n", " Tc1-IS630-Pogo 15615 2328538 bp 0.34 %\n", " En-Spm 0 0 bp 0.00 %\n", " MuDR-IS905 0 0 bp 0.00 %\n", " PiggyBac 2246 393498 bp 0.06 %\n", " Tourist/Harbinger 8431 876020 bp 0.13 %\n", " Other (Mirage, 0 0 bp 0.00 %\n", " P-element, Transib)\n", "\n", "Rolling-circles 0 0 bp 0.00 %\n", "\n", "Unclassified: 160681 41266796 bp 6.03 %\n", "\n", "Total interspersed repeats: 82697519 bp 12.08 %\n", "\n", "\n", "Small RNA: 214 40811 bp 0.01 %\n", "\n", "Satellites: 1396 217317 bp 0.03 %\n", "Simple repeats: 216869 9637447 bp 1.41 %\n", "Low complexity: 27520 1418990 bp 0.21 %\n", "==================================================\n", "\n", "* most repeats fragmented by insertions or deletions\n", " have been counted as one element\n", " Runs of >=20 X/Ns in query were excluded in % calcs\n", "\n", "\n", "The query species was assumed to be crassostrea gigas\n", "RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127\n", " \n", "run with rmblastn version 2.6.0+\n", "\n" ] } ], "source": [ "%%bash\n", "cat /home/sam/analyses/20180822_virginica_repeatmasker_Cgigas/Cvirginica_v300.fa.tbl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SUMMARY TABLE (species=_Crassostrea virginica_)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", "file name: Cvirginica_v300.fa \n", "sequences: 11\n", "total length: 684741128 bp (684675328 bp excl N/X-runs)\n", "GC level: 34.83 %\n", "bases masked: 46637065 bp ( 6.81 %)\n", "==================================================\n", " number of length percentage\n", " elements* occupied of sequence\n", "--------------------------------------------------\n", "Retroelements 43139 8952068 bp 1.31 %\n", " SINEs: 43139 8952068 bp 1.31 %\n", " Penelope 0 0 bp 0.00 %\n", " LINEs: 0 0 bp 0.00 %\n", " CRE/SLACS 0 0 bp 0.00 %\n", " L2/CR1/Rex 0 0 bp 0.00 %\n", " R1/LOA/Jockey 0 0 bp 0.00 %\n", " R2/R4/NeSL 0 0 bp 0.00 %\n", " RTE/Bov-B 0 0 bp 0.00 %\n", " L1/CIN4 0 0 bp 0.00 %\n", " LTR elements: 0 0 bp 0.00 %\n", " BEL/Pao 0 0 bp 0.00 %\n", " Ty1/Copia 0 0 bp 0.00 %\n", " Gypsy/DIRS1 0 0 bp 0.00 %\n", " Retroviral 0 0 bp 0.00 %\n", "\n", "DNA transposons 3538 1564942 bp 0.23 %\n", " hobo-Activator 0 0 bp 0.00 %\n", " Tc1-IS630-Pogo 0 0 bp 0.00 %\n", " En-Spm 0 0 bp 0.00 %\n", " MuDR-IS905 0 0 bp 0.00 %\n", " PiggyBac 0 0 bp 0.00 %\n", " Tourist/Harbinger 0 0 bp 0.00 %\n", " Other (Mirage, 0 0 bp 0.00 %\n", " P-element, Transib)\n", "\n", "Rolling-circles 0 0 bp 0.00 %\n", "\n", "Unclassified: 65151 23982146 bp 3.50 %\n", "\n", "Total interspersed repeats: 34499156 bp 5.04 %\n", "\n", "\n", "Small RNA: 43353 8992879 bp 1.31 %\n", "\n", "Satellites: 1 222 bp 0.00 %\n", "Simple repeats: 232627 10544162 bp 1.54 %\n", "Low complexity: 29762 1561018 bp 0.23 %\n", "==================================================\n", "\n", "* most repeats fragmented by insertions or deletions\n", " have been counted as one element\n", " Runs of >=20 X/Ns in query were excluded in % calcs\n", "\n", "\n", "The query species was assumed to be crassostrea virginica\n", "RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127\n", " \n", "run with rmblastn version 2.6.0+\n", "\n" ] } ], "source": [ "%%bash\n", "cat /home/sam/analyses/20180822_virginica_repeatmasker_Cvirginica/Cvirginica_v300.fa.tbl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SUMMARY TABLE (species=default)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", "file name: Cvirginica_v300.fa \n", "sequences: 11\n", "total length: 684741128 bp (684675328 bp excl N/X-runs)\n", "GC level: 34.83 %\n", "bases masked: 13461422 bp ( 1.97 %)\n", "==================================================\n", " number of length percentage\n", " elements* occupied of sequence\n", "--------------------------------------------------\n", "SINEs: 2056 120820 bp 0.02 %\n", " ALUs 0 0 bp 0.00 %\n", " MIRs 240 14635 bp 0.00 %\n", "\n", "LINEs: 3408 331585 bp 0.05 %\n", " LINE1 240 16835 bp 0.00 %\n", " LINE2 728 69177 bp 0.01 %\n", " L3/CR1 1369 135234 bp 0.02 %\n", "\n", "LTR elements: 704 236625 bp 0.03 %\n", " ERVL 14 944 bp 0.00 %\n", " ERVL-MaLRs 12 892 bp 0.00 %\n", " ERV_classI 272 36695 bp 0.01 %\n", " ERV_classII 4 206 bp 0.00 %\n", "\n", "DNA elements: 1088 100026 bp 0.01 %\n", " hAT-Charlie 27 1543 bp 0.00 %\n", " TcMar-Tigger 142 9891 bp 0.00 %\n", "\n", "Unclassified: 57 6096 bp 0.00 %\n", "\n", "Total interspersed repeats: 795152 bp 0.12 %\n", "\n", "\n", "Small RNA: 3698 279669 bp 0.04 %\n", "\n", "Satellites: 73 5524 bp 0.00 %\n", "Simple repeats: 247957 10848509 bp 1.58 %\n", "Low complexity: 30084 1536314 bp 0.22 %\n", "==================================================\n", "\n", "* most repeats fragmented by insertions or deletions\n", " have been counted as one element\n", " Runs of >=20 X/Ns in query were excluded in % calcs\n", "\n", "\n", "The query species was assumed to be homo sapiens \n", "RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127\n", " \n", "run with rmblastn version 2.6.0+\n", "\n" ] } ], "source": [ "%%bash\n", "cat /home/sam/analyses/20180822_virginica_repeatmasker_defaults/Cvirginica_v300.fa.tbl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Files copied to owl/Athaliana outside of notebook due to ```sudo``` requirement." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/mnt/owl/Athaliana/20180822_virginica_repeatmasker_all:\n", "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n", "\n", "/mnt/owl/Athaliana/20180822_virginica_repeatmasker_Cgigas:\n", "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n", "\n", "/mnt/owl/Athaliana/20180822_virginica_repeatmasker_Cvirginica:\n", "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n", "\n", "/mnt/owl/Athaliana/20180822_virginica_repeatmasker_defaults:\n", "Cvirginica_v300.fa.cat.gz\n", "Cvirginica_v300.fa.masked\n", "Cvirginica_v300.fa.out\n", "Cvirginica_v300.fa.out.gff\n", "Cvirginica_v300.fa.tbl\n", "stderr.err\n", "stdout.out\n" ] } ], "source": [ "%%bash\n", "ls /mnt/owl/Athaliana/20180822_virginica_repeatmasker_*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }