Statistics of the ecological analysis

YC Yujun Cui
BS Boris V. Schmid
HC Hanli Cao
XD Xiang Dai
ZD Zongmin Du
WE W. Ryan Easterday
HF Haihong Fang
CG Chenyi Guo
SH Shanqian Huang
WL Wanbing Liu
ZQ Zhizhen Qi
YS Yajun Song
HT Huaiyu Tian
MW Min Wang
YW Yarong Wu
BX Bing Xu
CY Chao Yang
JY Jing Yang
XY Xianwei Yang
QZ Qingwen Zhang
KJ Kjetill S. Jakobsen
YZ Yujiang Zhang
NS Nils Chr. Stenseth
RY Ruifu Yang
request Request a Protocol
ask Ask a question
Favorite

Among all 446 Y. pestis genomes analyzed in this study, the sequence of rpoZ gene is fully identical in 403 genomes. Here we defined the strains that carried major allele type of the rpoZ as “rpoZ reference” (identical to the CO92 reference strains). The other 43 genomes that carried minor allele types of rpoZ were defined as “rpoZ variants”. As described in the article, we used permutation testing (i.e., resampling without returns, also known as re-randomization testing) to gain insight into the question whether the climate in Guertu was different (that is colder, warmer, wetter, or drier climate, or the four possible combination thereof) during the estimated time periods since the phylogenetic branches that contained the rpoZ variants split off from the main tree, compared with the climate in the time periods preceding the rpoZ references. We extended the actual climate period to compare between samples to be somewhat longer than the branch length indicated by the phylogenetic tree. We did so because the exact sampling dates are not known for the samples (we know sampling occurred in June–July–August), and we did not want to accidentally exclude months that might have been relevant for the selection for rpoZ variants. So, all samples were treated as having been sampled on the 1st of September, and thus had three additional months added to the duration indicated by the branch length to cover the sampling period. We also rounded up the number of duration of the periods to include the whole month (e.g., if the branch length indicated to consider the climate back to the 25th of June 1979, we would include the whole of June). Finally, we added a full year to the duration period considered, to allow for trophic cascade effects, where the climate could exert its selection pressure through affecting the conditions for rodent and or flea populations in a way that would take time to express themselves in changes in rodent and flea densities22. On average, thus the duration of the climate considered for each sample was the branch length indicated by the phylogenetic tree (on average 4.9 years) + 3 months + 0.5 month + 1 year, which when added together is on average 6 years, 2 months, and 10 days, counting back from the 1st of September.

The date of diverge for the internal nodes (and thus of the branch lengths between the main tree and the sampling date of the rpoZ variants and the rpoZ references) in trees generated by BEAST comes with a substantial amount of uncertainty. To capture some of this uncertainty in our analysis, we used the logfile of phylogenetic trees generated by the MCMC process of BEAST, under the same settings as were used in a later iteration of the phylogenetic analysis that used BEAST2 (i.e., statistically the same output). This logfile gave us 8000 potential phylogenetic trees to work (the logfile held 10,000 trees collected during the MCMC, each 6000 iterations apart. We dropped the first 20%). In each of these trees, we calculated the average temperature and precipitation (on a monthly resolution) for the eight rpoZ variants combined and for eight randomly selected (without resampling) rpoZ reference samples combined, and recorded how the randomly selected set compared with the eight rpoZ variant samples in terms of precipitation, temperature, or combinations thereof. We iterated 300 times over all 8000 trees, resulting in a permutation test with a total of 2.4 million iterations. In total, we tested eight climate hypothesis: rpoZ variants arise during warmer, colder, drier, or wetter years, or the four possible combinations thereof (warmer and wetter, warmer, and drier, etc), compared with the rpoZ references.

Because our eight climate hypotheses are partially overlapping with each other, a Bonferroni correction would be overly stringent. As indicated in the article, we corrected for multiple testing with another layer of permutation testing, where we repeatedly marked eight randomly selected samples (without resampling) as our “samples of interest”, ran the permutation tests as described above again where we compared the eight selected samples against eight randomly selected samples from the “not samples of interest” pool, and scored how often we would have found by chance an equal of stronger P-value for at least one of the eight climate hypothesis (Fig. 2b). For each of 4000 iterations, we picked a new random set of eight “samples of interest”, and for each of these iterations we ran the permutation test described above for 24,000 iterations. The code used is available in the code repository.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A