The modified Roger’s distance (MRD) was calculated both between pairs of accessions within datasets and between samples of the same accession but different subsets. This calculation was based on matrices of allelic frequencies, each corresponding to a specific type of pool (Wright, 1978, p. 91). The pairwise distances were calculated as follows:
where MRD xy is the distance between x and y; L is the number of SNPs in the dataset; is the frequency of the ith allele at the jth locus of sample x; and is the frequency of the ith allele at the jth locus of sample y. The matrices were calculated using a custom R script.
We employed various analytical techniques to unravel the genetic patterns within our dataset and to compare outputs across types of pools. Principal coordinate analysis (PCoA) was employed to understand the MRD matrix. PCoA, a dimensionality-reduction method, was executed using the “gl.pcoa” function from “dartR” package, generating a two-dimensional representation of the data. For clustering analysis, we utilized the complete linkage algorithm from the “stats” R package (V4.0.4) (R Core Team, 2022) to cluster the MRD matrix. The nodes of the resulting dendrogram were tested using a bootstrap analysis using the “boot.phylo” function of the “ape” package (V5.4.1; Paradis and Schliep, 2019) using parameters “rooted = FALSE” and “B = 1000.”.
To explore population admixture, we compared the best estimation of K ancestral populations derived from all individuals, the seq-pools, or a single individual per accession. This comparison was conducted using the “LEA” package and the “snfm” function in R (V3.2.0; Frichot and François, 2015). To run “snmf” with the seq-pools, the standard output from DArTseq was used because the input files for the “LEA” package are designed for allele counts, not allele frequencies. To run the analysis, the data (individuals, seq-pools, and single plants) as “genlight” objects were transformed into STRUCTURE input files using the “gl2structure” function of ‘dartR’ package (using option “exportMarkerNames = FALSE” and all others as default). The STRUCTURE-formatted files were then converted into the geno format through the “struc2geno” function of “LEA” (parameters; “ploidy = 2, FORMAT = 2, extra.row = 0, extra.column = 1”), facilitating further in-depth analysis of genetic admixture patterns. The “snmf” method from the “LEA” package was executed for each dataset with specific parameters: “K = 1:20, ploidy = 2, entropy = TRUE, CPU = 20, repetitions = 5, iterations = 500, alpha = 100.” The optimal K, indicating the most likely number of ancestral populations given the data, was determined using the cross-entropy criterion, selecting the point where the cross entropy exhibited a plateau. Initially the ‘snmf’ run with individual samples did not display a plateau, leading to an additional run with K-values from 40 to 55. Visual representations, including bar plots of admixture coefficients and cross-entropy values plots across different K-values were generated using the ‘ggplot2’ package (V3.3.3, Wickham, 2016).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.