# Runs of Homozygosity Runs of Homozygosity (ROH) are continuous stretches of homozygous genotypes within an individual's genome, indicating that the segments are inherited from common ancestors. ROH can provide insights into the genetic history of populations, including levels of inbreeding, past population bottlenecks, and patterns of natural selection. ## What you need The Runs of Homozygosity analysis uses **VCF files**, either a single file you already have or multiple files generated by the SwarmGenomics pipeline. A VCF (Variant Call Format) file contains information about genetic variants, such as SNPs and indels, across the genome for one or more samples. ### Installations You need bcftools, which should already be installed. See [0.Installations](https://github.com/AureKylmanen/Swarmgenomics/blob/main/0.%20Installations.md) for instructions. Download R packages: ``` # Open R R # Install packages # Choose CRAN mirror, e.g. 39 install.packages(c("ggplot2", "dplyr", "gridExtra", "data.table")) ``` ## Running runs_of_homozygosity.bash You will need to download and edit [**params.txt**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Parameters/params.txt) file, and download [**runs_of_homozygosity.bash**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/runs_of_homozygosity.bash) and [**plot_roh.R**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/plot_roh.R) The following parameters in params.txt control how Runs of Homozygosity are detected and visualized. You can modify these depending on data quality, coverage, and your biological question. - ```ROH_MIN_GQ``` – Minimum genotype quality required for a site to be used in RoH detection. Increase this for stricter filtering (recommended for high-coverage data). - ```ROH_DEFAULT_AF``` – Default allele frequency assumed when AF is missing. Adjust if working with populations with very different expected allele frequencies. - ```ROH_PLOT_SCRIPT``` – Path to the R script used for plotting RoHs. - ```ROH_COUNT_COLOR``` – Bar color for the number of RoHs per size class. - ```ROH_LENGTH_COLOR``` – Bar color for the total length of RoHs per size class. - ```ROH_BINS``` – Size thresholds (in Mb) used to classify RoHs. - ```ROH_BIN_LABELS``` – Labels corresponding to each bin (must match the number of bins − 1). ``` # ============================ # RUNS OF HOMOZYGOSITY # ============================ ROH_MIN_GQ=30 ROH_DEFAULT_AF=0.4 # ROH plotting ROH_PLOT_SCRIPT="${SCRIPTS}/plot_roh.R" ROH_COUNT_COLOR="#48c9b0" ROH_LENGTH_COLOR="#f5b041" # ROH length bins (Mb) ROH_BINS="0.01,0.1,1,3,Inf" ROH_BIN_LABELS="10kbp-0.1Mbp,0.1-1Mbp,1-3Mbp,>3Mbp" ``` Remember to ```chmod +x runs_of_homozygosity.bash```. Then run with: ``` ./runs_of_homozygosity.bash "species" "params.txt" ``` The results will be copied into the results directory within the species directory. ## Running Runs of Homozygosity analysis step-by-step The vcf files are separated by scaffolds in the preprocessing steps, but for RoH analysis, merging the files is required. If you are not following the pipeline and have a single vcf file, skip to step 2 and change the input file name accordingly. You should also download [**all_roh_bar_plots.R**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/all_roh_bar_plots.R) for plotting. ``` # Step 0: Change directory to the VCF files cd vcf # Step 1: Merge multiple VCF files (skip if you have a single VCF) bcftools merge --force-samples -O z -o merged.vcf.gz *.vcf.gz # Step 2: Run bcftools RoH analysis # Change the name of the vcf file if you didn't do the previous step bcftools roh -G30 --AF-dflt 0.4 merged.vcf.gz > roh_results.txt # Step 3: Extract only the RG lines (header + data) grep -E "^# RG|^RG" roh_results.txt > RG.txt ``` - -G30 sets the minimum number of consecutive homozygous sites to call a RoH. - --AF-dflt 0.4 sets the default allele frequency for sites without a known frequency. - RG.txt will contain the filtered RoH results for further analysis. ### Plotting RoH results ``` # Plot RoHs Rscript all_roh_bar_plots.R ``` ## RoH results The bar plot provides an overview of the **number** and **total length** of Runs of Homozygosity (ROH) in the following size categories: - **<0.1 Mbp:** Represents short ROH, typically reflecting ancient homozygosity. - **0.1 – 1 Mbp:** ROH in this range often result from distant shared ancestry within a population. - **1 – 2 Mbp:** These ROH suggest closer relatedness, typically found in populations with some degree of inbreeding. - **>3 Mbp:** Long ROH indicate recent inbreeding, often due to mating between closely related individuals.