{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Set up a WQ-MAKER run using the Jupyter Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1. Get oriented. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will find staged example data in \"/opt/WQ-MAKER_example_data/\" within the MASTER instance. List its contents with the `ls` command:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 116K\r\n", "drwxr-xr-x 4 root root 4.0K Oct 20 17:50 .\r\n", "drwxr-xr-x 19 root root 4.0K Oct 25 18:02 ..\r\n", "-rw-r--r-- 1 root root 37 Oct 20 13:43 .ansible.cfg\r\n", "drwxr-xr-x 8 root root 4.0K Oct 20 17:50 .git\r\n", "-rwxr-xr-x 1 root root 1.4K Sep 11 15:31 maker_bopts.ctl\r\n", "-rwxr-xr-x 1 root root 1.4K Sep 11 15:31 maker_exe.ctl\r\n", "-rwxr-xr-x 1 root root 10 Oct 20 13:44 maker-hosts\r\n", "-rwxr-xr-x 1 root root 4.5K Oct 20 13:11 maker_opts.ctl\r\n", "drwxr-xr-x 2 root root 4.0K Sep 20 15:13 test_data\r\n", "-rwxr-xr-x 1 root root 708 Oct 20 13:55 worker-launch.yml\r\n", "-rw-r--r-- 1 root root 48K Oct 20 17:50 WQ-MAKER-Jupyter-notebook-demo.ipynb\r\n", "-rw-r--r-- 1 root root 23K Oct 20 16:21 WQ-MAKER-Jupyter-notebook.ipynb\r\n" ] } ], "source": [ "!ls -alh /opt/WQ-MAKER_example_data/" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mRNA.fasta\t\t Os-rRNA.fa\t test_genome_chr1.fasta\r\n", "msu-irgsp-proteins.fasta plant_repeats.fasta test_genome.fasta\r\n" ] } ], "source": [ "!ls /opt/WQ-MAKER_example_data/test_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* maker_*.ctl file are a set of configuration files that can be used for this exercise or generated as described below.\n", "* .ansible.cfg, worker-launch.yml and maker-hosts are ansible-playbook and host file for luanching jobs on WORKERS (optional for WQ-MAKER)\n", "* fasta files include a scaled-down genome (test_genome.fasta) which is comprised of the first 300kb of 12 chromosomes of rice and scaled-down genome (test_genome_chr1.fasta) which is comprised of the first 300kb of first chromosome of rice\n", "* mRNA sequences from NCBI (mRNA.fasta)\n", "* publicly available annotated protein sequences of rice (MSU7.0 and IRGSP1.0) - msu-irgsp-proteins.fasta collection of plant repeats (plant_repeats.fasta)\n", "* ribosomal RNAsequence of rice (Os-rRNA.fa)\n", "* WQ-MAKER-Jupyter-notebooks for running WQ-MAKER in Jupyter-notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Executables for running MAKER are located in /opt/maker/bin and /opt/maker/exe:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cegma2zff\t gff3_merge\t maker2wap\t\t map_fasta_ids\r\n", "chado2gff3\t iprscan2gff3 maker2zff\t\t map_gff_ids\r\n", "compare\t\t iprscan_wrap maker_functional\t mpi_evaluator\r\n", "cufflinks2gff3\t ipr_update_gff maker_functional_fasta mpi_iprscan\r\n", "evaluator\t maker\t maker_functional_gff tophat2gff3\r\n", "fasta_merge\t maker2chado\t maker_map_ids\r\n", "fasta_tool\t maker2eval_gtf map2assembly\r\n", "genemark_gtf2gff3 maker2jbrowse map_data_ids\r\n" ] } ], "source": [ "!ls /opt/maker/bin/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the names suggest the **/opt/maker/bin** directory includes many useful auxiliary scripts. For example cufflinks2gff3 will convert output from an RNA-seq analysis into a GFF3 file that can be used for input as evidence for WQ-MAKER. RepeatMasker, augustus, blast, exonerate, and snap are programs that MAKER uses in its pipeline. We recommend reading [MAKER Tutorial](http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014) at GMOD for more information about these." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2. Set up a WQ-MAKER run. Create a working directory called \"maker_run\" on your home directory using the mkdir command and use cd to move into that directory:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Navigate to the mounted volume for creating test directory. **This command assumes that you have already created and attached the volume to your MASTER instance.**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/vol_b\n" ] } ], "source": [ "%cd /vol_b" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/vol_b/wq_maker_run\n" ] } ], "source": [ "!mkdir wq_maker_run\n", "%cd wq_maker_run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3. Copy the contents of \"WQ-MAKER_example_data\" into the current directory using cp -r command. Verify using the ls command. Change the permissions on that directory" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!sudo cp -r /opt/WQ-MAKER_example_data/test_data .\n", "!sudo chown -hR $USER test_data\n", "!sudo chgrp -hR $USER test_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the maker help function" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Argument \"2.53_01\" isn't numeric in numeric ge (>=) at /usr/local/lib/x86_64-linux-gnu/perl/5.22.1/forks.pm line 1570.\n", "\n", "MAKER version 2.31.9\n", "\n", "Usage:\n", "\n", " maker [options] \n", "\n", "\n", "Description:\n", "\n", " MAKER is a program that produces gene annotations in GFF3 format using\n", " evidence such as EST alignments and protein homology. MAKER can be used to\n", " produce gene annotations for new genomes as well as update annotations\n", " from existing genome databases.\n", "\n", " The three input arguments are control files that specify how MAKER should\n", " behave. All options for MAKER should be set in the control files, but a\n", " few can also be set on the command line. Command line options provide a\n", " convenient machanism to override commonly altered control file values.\n", " MAKER will automatically search for the control files in the current\n", " working directory if they are not specified on the command line.\n", "\n", " Input files listed in the control options files must be in fasta format\n", " unless otherwise specified. Please see MAKER documentation to learn more\n", " about control file configuration. MAKER will automatically try and\n", " locate the user control files in the current working directory if these\n", " arguments are not supplied when initializing MAKER.\n", "\n", " It is important to note that MAKER does not try and recalculated data that\n", " it has already calculated. For example, if you run an analysis twice on\n", " the same dataset you will notice that MAKER does not rerun any of the\n", " BLAST analyses, but instead uses the blast analyses stored from the\n", " previous run. To force MAKER to rerun all analyses, use the -f flag.\n", "\n", " MAKER also supports parallelization via MPI on computer clusters. Just\n", " launch MAKER via mpiexec (i.e. mpiexec -n 40 maker). MPI support must be\n", " configured during the MAKER installation process for this to work though\n", " \n", "\n", "Options:\n", "\n", " -genome|g Overrides the genome file path in the control files\n", "\n", " -RM_off|R Turns all repeat masking options off.\n", "\n", " -datastore/ Forcably turn on/off MAKER's two deep directory\n", " nodatastore structure for output. Always on by default.\n", "\n", " -old_struct Use the old directory styles (MAKER 2.26 and lower)\n", "\n", " -base Set the base name MAKER uses to save output files.\n", " MAKER uses the input genome file name by default.\n", "\n", " -tries|t Run contigs up to the specified number of tries.\n", "\n", " -cpus|c Tells how many cpus to use for BLAST analysis.\n", " Note: this is for BLAST and not for MPI!\n", "\n", " -force|f Forces MAKER to delete old files before running again.\n", "\t\t\t This will require all blast analyses to be rerun.\n", "\n", " -again|a recaculate all annotations and output files even if no\n", "\t\t\t settings have changed. Does not delete old analyses.\n", "\n", " -quiet|q Regular quiet. Only a handlful of status messages.\n", "\n", " -qq Even more quiet. There are no status messages.\n", "\n", " -dsindex Quickly generate datastore index file. Note that this\n", " will not check if run settings have changed on contigs\n", "\n", " -nolock Turn off file locks. May be usful on some file systems,\n", " but can cause race conditions if running in parallel.\n", "\n", " -TMP Specify temporary directory to use.\n", "\n", " -CTL Generate empty control files in the current directory.\n", "\n", " -OPTS Generates just the maker_opts.ctl file.\n", "\n", " -BOPTS Generates just the maker_bopts.ctl file.\n", "\n", " -EXE Generates just the maker_exe.ctl file.\n", "\n", " -MWAS