Modified Roger’s distance and assessment of genetic patterns

Miguel Correa Abondano; Jessica Alejandra Ospina; Peter Wenzl; Monica Carvajal-Yepes

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Modified Roger’s distance and assessment of genetic patterns

MA Miguel Correa Abondano

JO Jessica Alejandra Ospina

PW Peter Wenzl

MC Monica Carvajal-Yepes

This method is extracted from research article: Front Plant Sci, Jul 2024

Sampling strategies for genotyping common bean (Phaseolus vulgaris L.) Genebank accessions with DArTseq: a comparison of single plants, multiple plants, and DNA pools

DOI: 10.3389/fpls.2024.1338332

Request a Protocol

Ask a question

Favorite

The modified Roger’s distance (MRD) was calculated both between pairs of accessions within datasets and between samples of the same accession but different subsets. This calculation was based on matrices of allelic frequencies, each corresponding to a specific type of pool (Wright, 1978, p. 91). The pairwise distances were calculated as follows:

where MRD _xy is the distance between x and y; L is the number of SNPs in the dataset; ${\hat{p}}_{i j (x)}$ is the frequency of the ith allele at the jth locus of sample x; and ${\hat{p}}_{i j (y)}$ is the frequency of the ith allele at the jth locus of sample y. The matrices were calculated using a custom R script.

We employed various analytical techniques to unravel the genetic patterns within our dataset and to compare outputs across types of pools. Principal coordinate analysis (PCoA) was employed to understand the MRD matrix. PCoA, a dimensionality-reduction method, was executed using the “gl.pcoa” function from “dartR” package, generating a two-dimensional representation of the data. For clustering analysis, we utilized the complete linkage algorithm from the “stats” R package (V4.0.4) (R Core Team, 2022) to cluster the MRD matrix. The nodes of the resulting dendrogram were tested using a bootstrap analysis using the “boot.phylo” function of the “ape” package (V5.4.1; Paradis and Schliep, 2019) using parameters “rooted = FALSE” and “B = 1000.”.

To explore population admixture, we compared the best estimation of K ancestral populations derived from all individuals, the seq-pools, or a single individual per accession. This comparison was conducted using the “LEA” package and the “snfm” function in R (V3.2.0; Frichot and François, 2015). To run “snmf” with the seq-pools, the standard output from DArTseq was used because the input files for the “LEA” package are designed for allele counts, not allele frequencies. To run the analysis, the data (individuals, seq-pools, and single plants) as “genlight” objects were transformed into STRUCTURE input files using the “gl2structure” function of ‘dartR’ package (using option “exportMarkerNames = FALSE” and all others as default). The STRUCTURE-formatted files were then converted into the geno format through the “struc2geno” function of “LEA” (parameters; “ploidy = 2, FORMAT = 2, extra.row = 0, extra.column = 1”), facilitating further in-depth analysis of genetic admixture patterns. The “snmf” method from the “LEA” package was executed for each dataset with specific parameters: “K = 1:20, ploidy = 2, entropy = TRUE, CPU = 20, repetitions = 5, iterations = 500, alpha = 100.” The optimal K, indicating the most likely number of ancestral populations given the data, was determined using the cross-entropy criterion, selecting the point where the cross entropy exhibited a plateau. Initially the ‘snmf’ run with individual samples did not display a plateau, leading to an additional run with K-values from 40 to 55. Visual representations, including bar plots of admixture coefficients and cross-entropy values plots across different K-values were generated using the ‘ggplot2’ package (V3.3.3, Wickham, 2016).

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol