Welcome! If you would like to analyze Next Generation Sequencing (NGS) data with CETO, you need to first do the following: - Get an account on Quest You will need storage space of your own to store your data. http://www.it.northwestern.edu/secure/forms/research/allocation-request.html - Request access to allocation p20742 (owned by Elizabeth Bartom ebartom@northwestern.edu ) This allocation contains the Ceto code as well as a variety of reference genomes and transcriptomes and some test data. https://app.smartsheet.com/b/form?EQBCT=71afee7a6a014e94b1299fa947ef43c5 - Request access to the genomics nodes on Quest http://www.it.northwestern.edu/research/user-services/quest/genomics.html - Try logging on to Quest http://www.it.northwestern.edu/research/user-services/quest/logon.html Once you have access to Quest, you need install the R packages used by Ceto. You can choose to install only those genomes relevant to your research. Copy and paste the following in to the command line: #load R module load R/3.2.2 # open R R # Set up R packages source("http://bioconductor.org/biocLite.R") biocLite("BSgenome.Mmusculus.UCSC.mm10") biocLite("BSgenome.Mmusculus.UCSC.mm9") biocLite("BSgenome.Hsapiens.UCSC.hg19") biocLite("BSgenome.Hsapiens.UCSC.hg38") biocLite("BSgenome.Scerevisiae.UCSC.sacCer3") biocLite("ChIPpeakAnno") biocLite("topGO") # exit R q() You will get an error pointing out that you cannot install R libraries for everyone. That’s fine. Accept the program’s proposal that R libraries be installed in your personal directory. Next, transfer your sequence data on to Quest: http://www.it.northwestern.edu/research/user-services/quest/filetransfer.html You are now ready to start analyzing your data! RNAseq data Log on to Quest. cd /projects/p20742/testRNA/ # Make a directory for your analysis results (Substitute b1025 with your allocation throughout) outputDirectory=/projects/b1025/testOutput/ mkdir $outputDirectory # Change to the directory you just made. cd $outputDirectory # Build the pipeline scripts in that directory. /projects/p20742/tools/buildPipelineScripts.pl \ -t RNA \ -o $outputDirectory \ -g mm10 \ -f /projects/p20742/testRNA/fastq \ -c /projects/p20742/testRNA/comparisons.csv \ -uploadASHtracks 0 \ -runAlign 1 \ -runEdgeR 1 \ >& buildPipelineScripts.testRNA.log & # Check on the status of your jobs running on the cluster showq -u ${USER} # Look at the arguments we used above: # The fastq directory ls /projects/p20742/testRNA/fastq # The comparisons file more /projects/p20742/testRNA/comparisons.csv # Look for files created by Ceto: ls $outputDirectory/testRNA/ # Poke around in the resulting files more $outputDirectory/testRNA/scripts/run_mESc-2i-RNA-DMSO-REP1_align.sh # These are the commands that are being run by Ceto # Wait about 8 hours for Ceto to finish running, and then explore the contents of the output directory # Try a different fastq directory and comparisons file. Can you run your own analysis?