# Unmapped Reads Unmapped reads are sequences from high-throughput sequencing data that do not align to a reference genome during the alignment process. These reads can reveal information on microbial communities to structural variations and novel genetic sequences. Utilizing tools like Kraken2 which can classify these reads, provide valuable insights into the broader genetic landscape of the samples. Kraken2 will create an output of a taxonomic classification of each read, helping to identify microbial species or other sources of the unmapped reads. ## What you need The input for Kraken2 is a **FASTQ file**, generated from a BAM file containing unmapped reads. ### Installations You will need to download [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py) script. As well as Kraken2 using miniconda with following steps: ``` # Install Kraken2 (version 2.1.3) to its own environment called kraken conda create -n kraken -c bioconda kraken2=2.1.3 # If you encounter an error with solving environment, try: conda config --env --add channels conda-forge # Activate kraken conda activate kraken #Install krakentools conda install bioconda::krakentools ``` You will also need to create or download a database. You may choose one from https://benlangmead.github.io/aws-indexes/k2 which suits your needs. Here we will download the PlusPFP-16 database. ``` # Make a directory for the database mkdir kraken_database # Download the database in the directory cd kraken_database wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16gb_20240112.tar.gz tar -xvzf k2_pluspf_16gb_20240112.tar.gz ``` Depending on the size of the database, this may take hours. ## Running unmapped_reads.bash You will need to download and edit [**params.txt**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Parameters/params.txt) file, and download [**unmapped_reads.bash**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/unmapped_reads.bash) and [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py) For the unmapped reads analysis important is to ensure the path to the database and plotting script are correct, and the name of the conda environment. ``` # ============================ # UNMAPPED READS # ============================ KRAKEN_DB="/vol/storage/kraken_database/" KRAKEN_CONDA_ENV=kraken KREPORT_TO_KRONA_SCRIPT="${SCRIPTS}/kreport2krona.py" UNMAPPED_BAM="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.bam" UNMAPPED_FASTQ="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.fastq" KRAKEN_OUTPUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.kraken" KRAKEN_REPORT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.kreport" CLASSIFIED_OUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_classified" UNCLASSIFIED_OUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unclassified" KRONA_HTML="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.krona.html" ``` Remember to ```chmod +x unmapped_reads.bash```. Then run with: ``` ./unmapped_reads.bash "species" "params.txt" ``` The results will be copied into the results directory within the species directory. ### Classifying reads For classifying the unmapped reads from a whole-genome sequencing project we use the kraken2 command. You need to replace $DBPATH with the path to where you have saved the database and the name you gave the database. The input file is a fasta file of the unmapped reads. You can also choose the number of threads you want kraken2 to use. The flags --classified-out and --unclassified-out are optional. ##### Creating the input file ``` # Create a BAM file with unmapped reads samtools view -b -f 4 bwa.sorted.bam > unmapped.bam # Change to fastq format bedtools bamtofastq -i unmapped.bam -fq unmapped.fastq ``` ##### Running kraken2 Activate kraken environment and run the script. ``` conda activate kraken kraken2 \ --db ${DBPATH} \ --output ${OUTPUTNAME} \ --use-names \ --report ${REPORTNAME} \ --classified-out ${CLASSIFIEDNAME} \ --unclassified-out ${UNCLASSIFIEDNAME} \ --confidence 0.1 \ --threads ${THREADNUM} \ ${INPUTNAME} # Example kraken2 \ --db /vol/storage/kraken_database/ \ --output unmapped.kraken \ --use-names \ --report unmapped.kreport \ --classified-out classified \ --unclassified-out unclassified \ --confidence 0.1 \ --threads 26 \ unmapped.fastq ``` ### Visualising the results You may then run the [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py) script which will produce an html file for visualising the results. ``` # Copy kreport2krona.py into your directory # Run the plot python3 kreport2krona.py -r REPORTNAME -o SAMPLE.krona ktImportText SAMPLE.krona -o SAMPLE.krona.html # Example # python3 kreport2krona.py -r unmapped.kreport -o SAMPLE.krona # ktImportText SAMPLE.krona -o SAMPLE.krona.html ``` # Resources For information on how to create your own database: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown, and on KrakenTools: https://github.com/jenniferlu717/KrakenTools/blob/master/README.md.