# PSMC Pairwise Sequentially Markovian Coalescent (PSMC) model is a method used in genetics to study the history of populations. It analyses the genetic variation within the DNA of a single individual to infer past population sizes over time. It is a way to look back in time and see how the number of ancestors of a species has changed, which can give insights into events like population bottlenecks or expansions. Here we use the software package from https://github.com/lh3/psmc ## What you need PSMC requires **a diploid FASTQ (.fq) file** as input, this is generated as part of the SwarmGenomics pipeline in [03.Selecting largest chromosomes](https://github.com/AureKylmanen/Swarmgenomics/blob/main/03.%20Selecting%20largest%20chromosomes.md). Alternatively, you can either use a FASTQ you already have or generate one from a VCF file according to the instructions. ### Installations #### PSMC In your software directory (e.g. /vol/storage/software) ``` #clone the repository git clone https://github.com/lh3/psmc.git #make psmc cd psmc make #make utils cd utils make ``` ## Running PSMC.bash Download [**psmc.bash**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Scripts/psmc.bash) and [**params.txt**](https://github.com/AureKylmanen/Swarmgenomics/blob/main/Parameters/params.txt). Edit the params.txt if you wish to change the PSMC parameters. Note that a generation time of 25 years might be appropriate for humans, but many organisms have a smaller generation time. ``` # ============================ # PSMC # ============================ DIPLOID_FASTQ="${WORKING_DIR}/${SPECIES}/diploid.fq.gz" DIPLOID_PSMCFA="${WORKING_DIR}/${SPECIES}/diploid.psmcfa" DIPLOID_PSMC="${WORKING_DIR}/${SPECIES}/diploid.psmc" PSMC_N_ITER=25 PSMC_T=15 PSMC_R=5 PSMC_PATTERN="2+2+25*2+4+6" MUTATION_RATE=1.25e-8 GENERATION_TIME=25 ``` Remember to ```chmod +x psmc.bash```. Then run with: ``` ./psmc.bash "species" "params.txt" ``` The results will be copied into the results directory within the species directory. ## Running PSMC step-by-step #### Convert fastq files to PSMC fasta format ``` /path/to/psmc/utils/fq2psmcfa -q20 /working_dir/diploid.fq.gz > /working_dir/diploid.psmcfa # Example # /vol/storage/software/psmc/utils/fq2psmcfa -q20 /vol/storage/swarmGenomics/golden_eagle/diploid.fq.gz > /vol/storage/swarmGenomics/golden_eagle/diploid.psmcfa ``` #### Run the PSMC analysis You may change the parameters according to your preferences. This step will take a few hours, so you may wish to use "nohup" and "&". ``` nohup psmc -N25 -t15 -r5 -p "2+2+25*2+4+6" -o diploid.psmc diploid.psmcfa & # Example # nohup /vol/storage/software/psmc/psmc -N25 -t15 -r5 -p "2+2+25*2+4+6" -o /vol/storage/swarmGenomics/golden_eagle/diploid.psmc /vol/storage/swarmGenomics/golden_eagle/diploid.psmcfa & ``` #### Convert the output of PSMC analysis ``` utils/psmc2history.pl diploid.psmc | utils/history2ms.pl > ms-cmd.sh # Example # /vol/storage/software/psmc/utils/psmc2history.pl /vol/storage/swarmGenomics/golden_eagle/diploid.psmc | /vol/storage/psmc_plot_dir/psmc/utils/history2ms.pl > /vol/storage/swarmGenomics/golden_eagle/ms-cmd.sh ``` ### Plot the results ``` # Plot as pdf with default parameters /vol/storage/software/psmc/utils/psmc_plot.pl diploid diploid.psmc # Example # /vol/storage/psmc_plot_dir/psmc/utils/psmc_plot.pl -p /vol/storage/swarmGenomics/golden_eagle/diploid diploid.psmc # OR if you know the generation time and mutation rate, you can plot with the specific values /vol/storage/software/psmc/utils/psmc_plot.pl -u -g diploid diploid.psmc # Example: # utils/psmc_plot.pl -u 1.29e-8 -g 12 diploid diploid.psmc ``` ## PSMC output PSMC estimates historical effective population size (Ne) from a diploid genome. The main output is a .psmc file (numeric estimates) and a plot showing Ne over time. - X-axis: Time in generations (can convert to years using generation time and mutation rate). - Y-axis: Effective population size (Ne). Interpretation: Peaks = population expansions, troughs = bottlenecks. **Note:** PSMC captures long-term trends and may be less accurate for very recent events.