# Unmapped Reads
Unmapped reads are sequences from high-throughput sequencing data that do not align to a reference genome during the alignment process. These reads can reveal information on microbial communities to structural variations and novel genetic sequences. 

Utilizing tools like Kraken2 which can classify these reads, provide valuable insights into the broader genetic landscape of the samples. Kraken2 will create an output of a taxonomic classification of each read, helping to identify microbial species or other sources of the unmapped reads.

## What you need
The input for Kraken2 is a **FASTQ file**, generated from a BAM file containing unmapped reads.

### Installations
You will need to download [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py) script. As well as Kraken2 using miniconda with following steps:

```
# Install Kraken2 (version 2.1.3) to its own environment called kraken
conda create -n kraken -c bioconda kraken2=2.1.3

# If you encounter an error with solving environment, try:
conda config --env --add channels conda-forge

# Activate kraken
conda activate kraken

#Install krakentools
conda install bioconda::krakentools
```

You will also need to create or download a database. You may choose one from https://benlangmead.github.io/aws-indexes/k2 which suits your needs. Here we will download the PlusPFP-16 database.
```
# Make a directory for the database
mkdir kraken_database

# Download the database in the directory
cd kraken_database
wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16gb_20240112.tar.gz
tar -xvzf k2_pluspf_16gb_20240112.tar.gz
```
Depending on the size of the database, this may take hours.

## Running unmapped_reads.bash
You will need to download and edit [**params.txt**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Parameters/params.txt) file, and download [**unmapped_reads.bash**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/unmapped_reads.bash) and [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py)

For the unmapped reads analysis important is to ensure the path to the database and plotting script are correct, and the name of the conda environment. 

```
# ============================
# UNMAPPED READS
# ============================

KRAKEN_DB="/vol/storage/kraken_database/"
KRAKEN_CONDA_ENV=kraken
KREPORT_TO_KRONA_SCRIPT="${SCRIPTS}/kreport2krona.py"

UNMAPPED_BAM="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.bam"
UNMAPPED_FASTQ="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.fastq"
KRAKEN_OUTPUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.kraken"
KRAKEN_REPORT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.kreport"
CLASSIFIED_OUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_classified"
UNCLASSIFIED_OUT="${WORKING_DIR}/${SPECIES}/${SPECIES}_unclassified"
KRONA_HTML="${WORKING_DIR}/${SPECIES}/${SPECIES}_unmapped.krona.html"
```


Remember to ```chmod +x unmapped_reads.bash```. Then run with:
```
./unmapped_reads.bash "species" "params.txt"
```
The results will be copied into the results directory within the species directory.

### Classifying reads
For classifying the unmapped reads from a whole-genome sequencing project we use the kraken2 command. You need to replace $DBPATH with the path to where you have saved the database and the name you gave the database. The input file is a fasta file of the unmapped reads. You can also choose the number of threads you want kraken2 to use. The flags --classified-out and --unclassified-out are optional.

##### Creating the input file
```
# Create a BAM file with unmapped reads
samtools view -b -f 4 bwa.sorted.bam > unmapped.bam

# Change to fastq format
bedtools bamtofastq -i unmapped.bam -fq unmapped.fastq
```

##### Running kraken2
Activate kraken environment and run the script.
```
conda activate kraken

kraken2 \
--db ${DBPATH} \
--output ${OUTPUTNAME} \
--use-names \
--report ${REPORTNAME} \
--classified-out ${CLASSIFIEDNAME} \
--unclassified-out ${UNCLASSIFIEDNAME} \
--confidence 0.1 \
--threads ${THREADNUM} \
${INPUTNAME}

# Example
kraken2 \
--db /vol/storage/kraken_database/ \
--output unmapped.kraken \
--use-names \
--report unmapped.kreport \
--classified-out classified \
--unclassified-out unclassified \
--confidence 0.1 \
--threads 26 \
unmapped.fastq
```

### Visualising the results
You may then run the [**kreport2krona.py**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/kreport2krona.py) script which will produce an html file for visualising the results.
```
# Copy kreport2krona.py into your directory
# Run the plot
python3 kreport2krona.py -r REPORTNAME -o SAMPLE.krona
ktImportText SAMPLE.krona -o SAMPLE.krona.html

# Example
# python3 kreport2krona.py -r unmapped.kreport -o SAMPLE.krona
# ktImportText SAMPLE.krona -o SAMPLE.krona.html
```
# Resources
For information on how to create your own database: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown, and on KrakenTools: https://github.com/jenniferlu717/KrakenTools/blob/master/README.md.