# Heterozygosity Understanding genetic diversity within a population is crucial for addressing the complexities of evolution, adaptation, and conservation. Heterozygosity, one of the most significant indicators of genetic diversity, measures the presence of different alleles at a gene locus within an individual’s genome. High heterozygosity indicates healthy genetic variation, allowing populations to adapt to environmental changes and avoid inbreeding depression. While, low heterozygosity may suggest inbreeding, genetic drift, or population bottlenecks, which can increase the risk of extinction. ## What you need The heterozygosity analysis uses **VCF files**, either a single file you already have or multiple files generated by the SwarmGenomics pipeline. A VCF (Variant Call Format) file contains information about genetic variants, such as SNPs and indels, across the genome for one or more samples. ### Installations You need bcftools, which should already be installed. If not, see [0.Installations](https://github.com/AureKylmanen/Swarmgenomics/blob/main/0.%20Installations.md) for instructions on installing bcftools. ## Running heterozygosity.bash The script runs through the vcf files created in the preprocessing steps of the SwarmGenomics pipeline. You will need to download and edit [**params.txt**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Parameters/params.txt) file, and download [**heterozygosity.bash**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/heterozygosity.bash) and [**plot_heterozygosity.R**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/plot_heterozygosity.R). In **params.txt**, edit: ``` # ============================ # WORKING DIRECTORIES # ============================ # Important to change! WORKING_DIR="/vol/storage/swarmgenomics" TOOL_DIR="/vol/storage/software" SCRIPTS="/vol/storage/swarmgenomics/scripts" # ============================ # TOOLS PATHS # ============================ BCFTOOLS="/${TOOL_DIR}/bcftools-1.19" # ============================ # HETEROZYGOSITY # ============================ # Number of scaffolds to plot TOP_SCAFFOLDS=20 HET_PLOT_COLOR="skyblue" HET_PLOT_SCRIPT="${SCRIPTS}/plot_heterozygosity.R" ``` Remember to ```chmod +x heterozygosity.bash```. Then run with: ``` ./heterozygosity.bash "species" "params.txt" ``` The results will be copied into the results directory within the species directory. ### Output The output graph shows heterozygosity per largest scaffolds/chromosomes. ## Estimating Heterozygosity on a single VCF file If you want to quickly calculate heterozygosity for a single VCF file without running the full script, you can use the following one-liner: ``` # Give a path to bcftools if needed # Change file.vcf according to your file name bcftools query -f '[%GT\n]' file.vcf | awk '{het+=($0=="0/1"||$0=="1/0"); hom+=($0=="0/0"||$0=="1/1")} END{if(het+hom>0) print het/(het+hom); else print 0}' ``` This will simply then print the calculated heterozygosity value.