# Mapping the Reads and Variant Calling #### Index reference genome ``` bwa index /vol/storage/swarmGenomics/golden_eagle/acreference.fna.gz ``` ### Map the reads ``` # Align paired-end reads to reference genome bwa mem -t 13 /vol/storage/swarmGenomics/golden_eagle/acreference.fna.gz /vol/storage/swarmGenomics/golden_eagle/fastq/ERR3316068_1_paired.fastq.gz /vol/storage/swarmGenomics/golden_eagle/fastq/ERR3316068_2_paired.fastq.gz > /vol/storage/swarmGenomics/golden_eagle/ERR3316068.sam # Sort aligned reads and save as BAM samtools sort -@ 13 /vol/storage/swarmGenomics/golden_eagle/ERR3316068.sam -o /vol/storage/swarmGenomics/golden_eagle/bwa.sorted.bam # Index sorted BAM file samtools index /vol/storage/swarmGenomics/golden_eagle/bwa.sorted.bam ``` Note on Multiple Read Groups: If your sequencing data comes from multiple runs (e.g., several SRA accessions or FASTQ files), make sure each is assigned a distinct read group (@RG) when aligning with BWA. Use the same SM (sample name) for all to ensure variant calling is performed on a single individual. You can align separately and merge BAMs with samtools merge. #### Index the reference ``` #Unzip the reference genome for samtools gunzip /vol/storage/swarmGenomics/golden_eagle/acreference.fna.gz #Index the reference samtools faidx /vol/storage/swarmGenomics/golden_eagle/acreference.fna ``` ### Variant calling ``` /vol/storage/bcftools-1.19/bcftools mpileup -C50 -f /vol/storage/swarmGenomics/golden_eagle/acreference.fna /vol/storage/swarmGenomics/golden_eagle/bwa.sorted.bam | /vol/storage/bcftools-1.19/bcftools call -c -o /vol/storage/swarmGenomics/golden_eagle/output.vcf # This step might take long so use nohup bash -c '/vol/storage/bcftools-1.19/bcftools mpileup -C50 -f /vol/storage/swarmGenomics/golden_eagle/acreference.fna /vol/storage/swarmGenomics/golden_eagle/bwa.sorted.bam | /vol/storage/bcftools-1.19/bcftools call -c -o /vol/storage/swarmGenomics/golden_eagle/output.vcf' > mpileup.log 2>&1 & ``` #### Remove repeats ``` #Create bed file with repeats perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (sort(keys(%fa))){ print "$s\n$fa{$s}\n" }}' /vol/storage/swarmGenomics/golden_eagle/acreference.fna | perl -lne 'if(/^>(\S+)/){ $n=$1} else { while(/([a-z]+)/g){ printf("%s\t%d\t%d\n",$n,pos($_)-length($1),pos($_)) } }' > /vol/storage/swarmGenomics/golden_eagle/repeats.bed #Remove repeats from vcf bedtools subtract -header -a /vol/storage/swarmGenomics/golden_eagle/output.vcf -b /vol/storage/swarmGenomics/golden_eagle/repeats.bed > /vol/storage/swarmGenomics/golden_eagle/output_no_repeats.vcf #make vcf without repeats standard vcf file mv /vol/storage/swarmGenomics/golden_eagle/output_no_repeats.vcf /vol/storage/swarmGenomics/golden_eagle/output.vcf ``` #### Gzip and index the file ``` /vol/storage/bcftools-1.19/bcftools view /vol/storage/swarmGenomics/golden_eagle/output.vcf -Oz -o /vol/storage/swarmGenomics/golden_eagle/output.vcf.gz #index vcf file (output.vcf.gz.tbi) tabix -p vcf /vol/storage/swarmGenomics/golden_eagle/output.vcf.gz ```