Population structure and differentiation in Petrosia ficiformis

AR Ana Riesgo
ST Sergi Taboada
RP Rocío Pérez-Portela
PM Paolo Melis
JX Joana R. Xavier
GB Gema Blasco
SL Susanna López-Legentil
request Request a Protocol
ask Ask a question
Favorite

Several methods to assess population structure and differentiation in P. ficiformis were used: two based on clustering approaches (STRUCTURE and a Discriminant Analysis of Principal Components, DAPC) and four distance-based methods: FST estimations, Isolation by Distance (IBD), BARRIER, and Analyses of the Molecular Variance (AMOVA).

The assignment of individuals to each population was performed using a Bayesian clustering approach in STRUCTURE 2.3.4 [97], that calculates population allele frequencies and then assigns individuals to populations probabilistically, always based on the estimates of Hardy Weinberg equilibrium (HWE) and/or linkage equilibrium. The specific parameters used were admixture, since although we had no previous knowledge on the origin of the populations studied, we assumed that a proportion of individuals can have recent ancestor coming from multiple populations, no locprior, since no additional sample-characteristic data was available, and correlated allele frequencies, because we had no previous knowledge on the correlation levels across populations [98]. The program was run with a burn-in time of 100,000 repetitions and 100,000 iterations (MCMC), setting the putative K (predicted number of genetic units) from 2 to 20 (one cluster more than the number of sampling sites considered in the analysis) and twenty replicated runs. The estimation of log probabilities of data Pr(X | K) for each value of K was evaluated by calculating ∆K, which accounts for the rate of change in the log probability of data between successive K values, currently considered a more reliable predictor of the true number of populations [99]. Convergence was assessed with the alpha parameter. Calculations and evaluation of ∆K were performed with STRUCTURE HARVESTER [100]. We then used CLUMPAK web server [101] to find the major and minor best alignment of the results across the range of K values by averaging the probabilities of each K cluster. Graphs were visualized in CLUMPAK [101] and the major mode solution was selected.

For further assessment of population differentiation, the multivariate DAPC method was applied using the adegenet 2.1.1 package [102] implemented in R 2.14 [103]. DAPC defines clusters using the algorithm k-means on transformed data with Principal Component Analysis (PCA), which is then run sequentially with increasing values of k. The resulting clustering solutions are compared using Bayesian Information Criterion (BIC), with the optimal cluster solution corresponding to the lowest value of BIC. Before performing the analysis, the optimal number of PCs to be retained was explored by a cross-validation method as implemented in the same package.

Population differentiation between pairwise sampling sites was estimated with the FST statistic using an infinite allele model (IAM) in the software Arlequin 3.5 [104]. Significance of FST values was analysed with 20,000 permutations and corrected using the B-Y [105] False Discovery Rate (FDR) approach as described in [106]. In addition, the frequency of null alleles was estimated with Microchecker 2.2.3 [107]. Only the microsatellite 17PETRO contained null alleles, and we corrected allele frequencies and FST values after it, using the ENA method [108] described in FreeNA [109]. Global FST and average FST per population following [110] with UPGMA clustering of populations were obtained using the R packages adegenet 2.1.1 [102] and hierfstat v0.04–22 [111].

To determine whether genetic differentiation was driven by geographical distance creating a pattern of IBD, linearized pairwise FST estimates (FST /1- FST) were correlated against log-transformed geographical distances between samples [112] using a Mantel test with all the sites together and stratified Mantel tests using the clusters separated by oceanographic barriers obtained below in GENODIVE version 2.0b23 [95]. Geographical distances were estimated as the minimum linear distance between pairs of locations by sea. Furthermore, to localize the occurrence of genetic breaks in the population structure of P. ficiformis (i.e., oceanographic fronts), pairwise FST values and coordinates for sampling sites were implemented in the software BARRIER v2.2 [113]. BARRIER links the matrix of geographical coordinates with their corresponding distance matrix (FST), and applies the Monmonier’s maximum distance algorithm to identify ‘barriers’ to gene flow among sites, namely the zones where differences between pairs of sites are the largest.

Finally, an Analysis of Molecular Variance (AMOVA) was performed to determine the hierarchical distribution of genetic variation in GENODIVE version 2.0b23 [95]. To reveal the source of variation for the genetic differentiation, we a priori defined several groupings: 1) two groups: Atlantic (SMI, FLO, CAN, MAD) vs. Mediterranean (CART, CAR, BLA, FEL, ULL, ESC, MRS, NIZ, LIG, NAP, SLO, SCRO, JECRO, CRE, ISR) populations; 2) three groups: Atlantic (SMI, FLO, CAN, MAD), Western Mediterranean (CART, CAR, BLA, FEL, ULL, ESC, MRS, NIZ, LIG, NAP), and Eastern Mediterranean (SLO, SCRO, JECRO, CRE, ISR) populations; and 3) four groups: Azores (SMI, FLO), Madeira and Canary Islands (CAN, MAD), Western Mediterranean (CART, CAR, BLA, FEL, ULL, ESC, MRS, NIZ, LIG, NAP), and Eastern Mediterranean (SLO, SCRO, JECRO, CRE, ISR) populations. The significance of the AMOVAs was calculated with 10,000 permutations of the original data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A