Statistical treatment of data

Paolo Ruggeri; Andrea Splendiani; Cristina Di Muri; Tatiana Fioravanti; Alberto Santojanni; Iole Leonori; Andrea De Felice; Ilaria Biagiotti; Piera Carpi; Enrico Arneri; Paola Nisi Cerioni; Massimo Giovannotti; Vincenzo Caputo Barucchi

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Statistical treatment of data

PR Paolo Ruggeri

AS Andrea Splendiani

CM Cristina Di Muri

TF Tatiana Fioravanti

AS Alberto Santojanni

IL Iole Leonori

AF Andrea De Felice

IB Ilaria Biagiotti

PC Piera Carpi

EA Enrico Arneri

PC Paola Nisi Cerioni

MG Massimo Giovannotti

VB Vincenzo Caputo Barucchi

This method is extracted from research article: PLoS One, Mar 2016

Coupling Demographic and Genetic Variability from Archived Collections of European Anchovy (Engraulis encrasicolus)

DOI: 10.1371/journal.pone.0151507

Request a Protocol

Ask a question

Favorite

We estimated genetic diversities from genotypic and allelic frequencies only after checking the quality of the genotypes (amplification success rate and genotype consistency) and the occurrence of null alleles and other genotyping errors (allele dropout and stutter peaks) using MICRO-CHECKER 2.2.1 [38]. Loci affected by null alleles were corrected following the Brookfield algorithm [39]. Mean number of alleles observed for each locus (N_A), allelic richness (R_S), observed (H_o) and expected (H_e) heterozygosities, and the inbreeding coefficient (F_IS) were estimated with FSTAT 2.9.3 [40]. Linkage disequilibrium between loci was tested with a Monte Carlo Markov Chain (MCMC) test executed by 1000 batches of 2000 iterations each, using Genepop 4.0.10 [41]. A sequential Bonferroni adjustment of P-values was applied to account for an increase in type-I error from multiple comparisons [42]. Outlier loci were detected by departures from the neutral expectations with fdist [43] implemented in Lositan [44] and a LnRH method developed for microsatellites [45] (S1 File).

Temporal variation in genetic diversity was quantified using estimates of expected (H_e) and observed (H_o) heterozygosity and the expected number of alleles (N_A). Since estimates of genetic diversity can be biased by differences in sample sizes, the software POPTOOLS 3.1.0 [46] (http://www.cse.csiro.au/poptools) was used to standardize and analyze temporal trends in these variables. Estimates were standardized by generating 1000 'real' samples (n = 24) with sampling without replacement for each year and location (Chioggia and Vieste). A total of 9000 “real” samples were generated. In addition, 9000 “randomized” samples were generated pooling together the samples from the Chioggia and Vieste time series. Statistical significances of temporal trends were estimated by calculating the slope (b) and the Pearson’s correlation coefficient (r) of linear regressions of H_e, H_o, and N_A against years. The r and b estimates of "real" parameters for each year in both the Chioggia and Vieste time series were compared with randomized values, using MCMC chains of 1000 iterations. We evaluated the significance of trends in H_e, H_o, and N_A over the entire time-series and between consecutive years.

Effective population size (N_e) was estimated using a moment-based temporal method and a coalescent likelihood-based method. These methods are generally robust and are often used on historical time series [13]. Both temporal methods use changes in allele frequencies between two samples separated by a known number of generations. Since anchovies become mature at one year of age [24], one generation per year was assumed. Following [47], N_e was also estimated over the whole timeframe of each time series to detect consistency with estimates obtained from intermediate temporal timeframes. NE-ESTIMATOR 2.01 [48] was used to estimate moment-based N_e values. We implemented the following procedures: i) we used Plan II sampling in which individuals were removed after sampling, ii) we used the Pollak formula [49] for computing the standardized variance in allele frequency (F_K) to estimate N_e values, iii) we estimated 95% confidence intervals from jackknifing, and iv) we used a threshold of 0.01 as the lowest allele frequency considered to estimate N_e values, because allele frequencies were seldom lower than 0.01. The TM3 software [50] was used to obtain coalescent likelihood-based estimates of N_e. TM3 uses a Bayesian approach based on coalescent theory and MCMC simulations to generate a posterior distribution of N_e values. In addition, we used maximum N_e values (Ne MAX) and the number of generations between consecutive samples (T) as priors in the TM3 simulations. To test for consistency between simulations, three independent runs were performed using Ne MAXes equal to 1000, 5000, and 10,000. Simulations were performed with 20,000 MCMC iterations and were conducted separately for Chioggia and Vieste samples.

Estimates of N_e from the moment-based temporal method were used to provide temporal trends in N_e/N_c ratios from both time series. Estimates of N_c were obtained from annual acoustic surveys (MEDIAS Project, http://www.medias-project.eu; Fig 2) [22, 23] for stocks in the northern and middle-southern Adriatic Sea.

(a) trend of abundance in the northern Adriatic Sea. (b) trend of abundance in the middle-southern Adriatic Sea.

Datasets were tested for genetic population bottleneck signatures with a heterozygosity-excess based method implemented in BOTTLENECK 1.2 [51]. This test assumes that during a strong reduction in population size, allele numbers decline faster than H_e. Since expected heterozygosity at mutation–drift equilibrium (H_eq) is calculated from the number of observed alleles, H_e becomes larger than H_eq, leading to heterozygosity excess (H_exc) [52]. Values of H_exc were tested with Wilcoxon’s signed rank tests [53], a powerful and robust test when the dataset contains few (< 20) polymorphic loci [51]. We used 1000 iterations and three mutational models: infinite allele model (IAM), stepwise mutation model (SMM), and a two phase mutation model (TPM) with 95% single-step mutations and 5% multistep mutations, as recommended in [51].

In addition, purported bottlenecks were verified by simulating bottlenecked populations using BOTTLESIM 2.6 [54]. Genotypic data for CH78 and VI85 were used independently as founder populations to simulate bottleneck events lasting 32 and 25 generations, respectively. Population declines were tested using 1000 iterations with the following criteria: dioecious organisms with random mating system, balanced sex ratio (1:1), two years of expected lifespan for the species, given its high natural mortality [19], one year at first reproduction [24], and 90% generational overlap. The moment-based N_e estimates from NE-ESTIMATOR were used as analogues of population size. Simulated values of H_e and N_A were plotted and compared with observed H_e and N_A estimates to provide evidence of a population bottleneck.

Finally, we estimated the degree of geographic and temporal variability among samples first with pairwise values of θ_ST [55], calculated with FSTAT and second with a Bayesian approach, implemented in STRUCTURE 2.3.2.1 [56, 57]. The second approach estimates the number of populations (K) under Hardy–Weinberg expectations and linkage equilibrium. The most probable number was tested using priors ranging from K = 1 to K = 6, under an admixture model and with correlated allele frequencies. Ten independent runs were performed for each K using an MCMC of 500,000 iterations after a burn-in of 50,000 iterations.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol