# Installations This includes the required installations for the first steps of the SwarmGenomics pipeline. For the later analyses, the installations will be provided in the corresponding instructions. These instructions mainly use miniconda, but you may use other methods as you prefer. You may want to adapt certain installations as newer packages are released. ## Preprocessing This includes all the installations for steps 1-3. ### Miniconda3 First we install miniconda3 ```bash # Download the software from the website wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh #select "yes" for everything, and if you are working on a denbi VM, change the storage location to /vol/storage/miniconda3 # Activate conda conda activate ``` ### Conda installations ```bash conda install bioconda::fastqc conda install bioconda::bwa conda install bioconda::bwa-mem2 conda install bioconda::vcftools conda install conda-forge::r-splitstackshape conda install bioconda::bedtools conda install bioconda::tabix conda install bioconda::samtools conda install bioconda::trimmomatic conda install bioconda::seqkit conda install -c conda-forge ncurses conda install -c bioconda htslib ``` ### SRA tools You may also download this on Conda, but make sure you get latest version. ```bash # /vol/storage/software wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz tar -xzf sratoolkit.current-centos_linux64.tar.gz rm sratoolkit.current-centos_linux64.tar.gz ``` ### Bcftools ```bash # /vol/storage/software wget https://github.com/samtools/bcftools/releases/download/1.19/bcftools-1.19.tar.bz2 tar -xjf bcftools-1.19.tar.bz2 cd bcftools-1.19 ./configure --prefix=/vol/storage/software/bcftools-1.19 make rm /vol/storage/software/bcftools-1.19.tar.bz2 ``` ### Trimmomatics adapters This will create a file with adapters for Trimmomatics. ```bash # Create a directory for adapters mkdir -p /vol/storage/software/trimmomatics vim /vol/storage/software/trimmomatics/adapters.fa # press i # paste the sequences from github (e.g. NexteraPE-PE.fa, TruSeq3-PE-2.fa) # https://github.com/timflutre/trimmomatic/tree/master/adapters >PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT >PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT >PE1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT >PE1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA >PE2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT >PE2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC >PrefixNX/1 AGATGTGTATAAGAGACAG >PrefixNX/2 AGATGTGTATAAGAGACAG >Trans1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG >Trans1_rc CTGTCTCTTATACACATCTGACGCTGCCGACGA >Trans2 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG >Trans2_rc CTGTCTCTTATACACATCTCCGAGCCCACGAGAC # press esc # type :wq! ``` ## Genomic Modules Below are all installations for genomic modules steps 4-12. ### PSMC ```bash # Clone the repository git clone https://github.com/lh3/psmc.git # Make psmc cd psmc make # Make utils cd utils make ``` ### R and R packages for plotting ```bash # Install R conda install r-base r-essentials ``` ```r # Open R R # Install R packages install.packages(c("ggplot2", "dplyr", "gridExtra", "RColorBrewer", "patchwork", "data.table", "tidyr")) ``` ### Mitochondrial Genome Reconstruction ``` # GetOrganelle installation conda create -n getorganelle -c bioconda getorganelle # Activate environment conda activate getorganelle # Install configuration get_organelle_config.py -a animal_mt # Deactivate conda deactivate ``` ### Phylogenetic Tree visualisation ``` sudo apt install ncbi-blast+ sudo apt install seqkit sudo apt install mafft sudo apt install fasttree pip install toytree toyplot biopython ``` ### NUMT Identification ```r # R conda create -n r_env r-essentials r-base conda activate r_env # Blast conda install bioconda::blast # Bedtools (you should already have this installed) conda install bioconda::bedtools # Deactivate conda deactivate ``` ### Unmapped Reads ```bash # Install Kraken2 (version 2.1.3) to its own environment called kraken conda create -n kraken -c bioconda kraken2=2.1.3 # If you encounter an error with solving environment, try: conda config --env --add channels conda-forge # Activate kraken conda activate kraken #Install krakentools conda install bioconda::krakentools ``` You will also need to create or download a database. You may choose one from https://benlangmead.github.io/aws-indexes/k2 which suits your needs. Here we will download the PlusPFP-16 database. ```bash # Make a directory for the database mkdir kraken_database # Download the database in the directory cd kraken_database wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16gb_20240112.tar.gz tar -xvzf k2_pluspf_16gb_20240112.tar.gz ``` ### Genome Visualisation ```bash conda install bioconda::mosdepth ```bash Circos software: ```bash # Change to the directory where you want to install Circos # E.g. /vol/storage/software cd /vol/storage/software # Get the latest circos download from https://circos.ca/software/download/ wget --no-check-certificate https://circos.ca/distribution/circos-0.69-9.tgz tar -xvzf circos-0.69-9.tgz ``` For more instructions on installation visit https://circos.ca/software/installation/ ```bash # Install dependencies # Install libraries sudo apt-get install -y libgd-dev # Install Perl GD module with Conda conda install -c conda-forge perl-gd # Install perl-Params-Validate module with Conda conda install -c conda-forge perl-params-validate ``` Install Perl modules ```perl # Enter CPAN shell cpan # Install modules install Readonly install Font::TTF::Font install Math::Bezier install Math::Round install Config::General install GD install Set::IntSpan install List::MoreUtils install GD::Polyline install Math::VecStat install SVG install Params::Validate install Regexp::Common install Text::Format install Statistics::Basic # To exit exit ``` ### Repeat Analysis Change the installation locations accodingly. ```bash # Install HMMER cd /vol/storage/software wget http://eddylab.org/software/hmmer/hmmer-3.3.2.tar.gz tar -xzf hmmer-3.3.2.tar.gz rm /vol/storage/software/hmmer-3.3.2.tar.gz cd hmmer-3.3.2 ./configure --prefix=/vol/storage/software/hmmer-3.3.2 make make install # Install RM BLAST cd /vol/storage/software wget http://www.repeatmasker.org/rmblast/rmblast-2.14.0+-x64-linux.tar.gz tar -xzf rmblast-2.14.0+-x64-linux.tar.gz rm rmblast-2.14.0+-x64-linux.tar.gz # Install TRF cd /vol/storage/software wget https://github.com/Benson-Genomics-Lab/TRF/archive/refs/tags/v4.09.1.tar.gz tar -xzf /vol/storage/software/v4.09.1.tar.gz rm /vol/storage/software/v4.09.1.tar.gz mkdir /vol/storage/software/TRF-4.09.1/build cd /vol/storage/software/TRF-4.09.1/build /vol/storage/software/TRF-4.09.1/configure --prefix=/vol/storage/software/TRF-4.09.1 make make install # Install RepeatMasker cd /vol/storage/software wget https://www.repeatmasker.org/RepeatMasker/RepeatMasker-4.1.5.tar.gz tar -xzf RepeatMasker-4.1.5.tar.gz rm RepeatMasker-4.1.5.tar.gz # Configure RepeatMasker perl /vol/storage/software/RepeatMasker/configure #Paths for RepeatMasker /vol/storage/software/TRF-4.09.1/bin/trf /vol/storage/software/rmblast-2.14.0/bin /vol/storage/software/hmmer-3.3.2/bin #Selections Add a Search Engine: 1. Crossmatch: [ Un-configured ] 2. RMBlast: [ Configured ] 3. HMMER3.1 & DFAM: [ Configured, Default ] 4. ABBlast: [ Un-configured ] 5. Done # Install RECON cd /vol/storage/software wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz tar RECON-1.08.tar.gz rm /vol/storage/software/RECON-1.08.tar.gz cd /vol/storage/software/RECON-1.08/src make make install # Install Ninja cd /vol/storage/software wget https://github.com/TravisWheelerLab/NINJA/archive/0.95-cluster_only.tar.gz tar -xzf /vol/storage/software/0.95-cluster_only.tar.gz rm /vol/storage/software/0.95-cluster_only.tar.gz mv /vol/storage/software/NINJA-0.95-cluster_only/NINJA/Ninja_new /vol/storage/software/NINJA-0.95-cluster_only/NINJA/Ninja # Install LTR Retriever cd /vol/storage/software wget https://github.com/oushujun/LTR_retriever/archive/v2.8.tar.gz tar -xzf v2.8.tar.gz rm v2.8.tar.gz # Install Mafft cd /vol/storage/software wget https://mafft.cbrc.jp/alignment/software/mafft-7.505-with-extensions-src.tgz tar -xzf /vol/storage/software/mafft-7.505-with-extensions-src.tgz rm /vol/storage/software/mafft-7.505-with-extensions-src.tgz cd /vol/storage/software/mafft-7.505-with-extensions/core sed -i 's#PREFIX = /usr/local#PREFIX = /vol/storage/software/mafft-7.505-with-extensions#' /vol/storage/software/mafft-7.505-with-extensions/core/Makefile sed -i 's#BINDIR = $(PREFIX)/bin#BINDIR = /vol/storage/software/mafft-7.505-with-extensions/bin#' /vol/storage/software/mafft-7.505-with-extensions/core/Makefile make clean make make install cd /vol/storage/software/mafft-7.505-with-extensions/extensions sed -i 's#PREFIX = /usr/local#PREFIX = /vol/storage/software/mafft-7.505-with-extensions#' /vol/storage/software/mafft-7.505-with-extensions/extensions/Makefile sed -i 's#BINDIR = $(PREFIX)/bin#BINDIR = /vol/storage/software/mafft-7.505-with-extensions/bin#' /vol/storage/software/mafft-7.505-with-extensions/extensions/Makefile make clean make make install # Install CD-Hit cd /vol/storage/software wget https://github.com/weizhongli/cdhit/archive/refs/tags/V4.8.1.tar.gz tar -xzf /vol/storage/software/V4.8.1.tar.gz rm /vol/storage/software/V4.8.1.tar.gz cd /vol/storage/software/cdhit-4.8.1 make # Install Genometools cd /vol/storage/software wget http://genometools.org/pub/genometools-1.6.2.tar.gz tar -xzf /vol/storage/software/genometools-1.6.2.tar.gz rm /vol/storage/software/genometools-1.6.2.tar.gz cd /vol/storage/software/genometools-1.6.2 make prefix=/vol/storage/software/genometools-1.6.2 cairo=no # Install RepeatModeler2 cd /vol/storage/software wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-2.0.4.tar.gz tar -xvf /vol/storage/software/RepeatModeler-2.0.4.tar.gz rm /vol/storage/software/RepeatModeler-2.0.4.tar.gz cd /vol/storage/software/RepeatModeler-2.0.4 # Install RepeatScout cd /vol/storage/software wget http://www.repeatmasker.org/RepeatScout-1.0.6.tar.gz tar -xzf /vol/storage/software/RepeatScout-1.0.6.tar.gz rm /vol/storage/software/RepeatScout-1.0.6.tar.gz cd /vol/storage/software/RepeatScout-1.0.6 make # Install USCS rm -r /vol/storage/software/USCS mkdir /vol/storage/software/USCS cd /vol/storage/software/USCS rsync -aP hgdownload.soe.ucsc.edu::genome/admin/exe/linux.x86_64/ ./ cpan install JSON #yes #sudo cpan install File::Which cpan install URI cpan install Devel::Size cpan install LWP::UserAgent cd /vol/storage/software/RepeatModeler-2.0.4 perl ./configure # Configure RepeatModeler2 with Paths w/ yes ("y") to using LTR Retriever */vol/storage/software/RepeatMasker */vol/storage/software/RECON-1.08/bin */vol/storage/software/RepeatScout-1.0.6 */vol/storage/software/TRF-4.09.1/bin */vol/storage/software/cdhit-4.8.1 */vol/storage/software/USCS */vol/storage/software/rmblast-2.14.0/bin #yes */vol/storage/software/genometools-1.6.2/bin */vol/storage/software/LTR_retriever-2.8 */vol/storage/software/mafft-7.505-with-extensions/bin */vol/storage/software/NINJA-0.95-cluster_only/NINJA ```