Statistical analysis

GC Gianni M. Castiglione
ZX Zhenhua Xu
LZ Lingli Zhou
ED Elia J. Duh
request Request a Protocol
ask Ask a question
Favorite

We used codon-based likelihood models of molecular evolution from the PAML 4.7 software package to characterize the evolutionary rates of mammalian and avian KEAP1, as well as avian GSTA2. For these analyses, we constructed a species tree using established relationships69,70 (Supplementary Figs. 7 and 8). First, we estimated the evolutionary rates (dN/dS) within mammalian (KEAP1) and avian (GSTA2) datasets independently using the random site models (M1, M2, M3, M7, M8) implemented in the CODEML program71 (Supplementary Tables 2 and 9). Avian GSTA2 sites predicted to be in the positive selection site class of M8 were identified by high posterior probabilities produced by Bayes empirical Bayes analysis72. To identify evidence of genetic recombination in our avian KEAP1 dataset, we employed a maximum likelihood method (GARD) implemented in the HyPhy datamonkey server73,74. Based on the evidence for recombination (Supplementary Table 3), we created a tetrapod KEAP1 dataset consisting only of phylogenetically congruous KEAP1 coding sequence (631–1411, Human KEAP1 numbering). We analyzed this pruned dataset using PAML Clade model D (CmD) to explicitly test for long-term shifts in evolutionary rates (dN/dS) between foreground and background clades within the tetrapod KEAP1 datasets (Supplementary Table 5; ref. 75). In any partitioning scheme, all non-foreground data are present in the background partition. The foreground partitions are listed after the underscore for the clade models (e.g., CmC_Birds vs. Mammals). M3 with three site classes was used as the null model for CmD. All random sites and clade model PAML model pairs were statistically evaluated for significance by likelihood ratio tests with a χ2 distribution.

We obtained body mass and BMR (kJ/h) data for >530 avian species from a published dataset stringently curated to eliminate sources of variation on BMR (such as inactivity, seasonality, and thermoneutrality9). We calculated MS-BMR as the ratio of BMR to mass. We collected maximum lifespan data for >1000 avian species from the Human Ageing Genomic Resources AnAge database76. The final dataset prepared for statistical analysis was obtained by cross-referencing both BMR and lifespan datasets, resulting in a final dataset of 206 species. Over 97% of species in this consolidated dataset had lifespan data designated as “acceptable” or “high” by AnAge curators76. To increase sample size, we included “questionable” lifespan data points since these reflected conservative estimates of Neoaves lifespan (i.e., values were less than their closest relatives in the same genus or order). Basal Aves and Neoaves species were stratified according to lifespan and MS-BMR intervals. The mean lifespan and MS-BMR in each respective grouping were statistically indistinguishable [Mann–Whitney, Supplementary Table 11; binned MS-BMR and lifespan data was found to be non-normal, even after various transformation attempts (as assessed by a Ryan–Joiner test in Minitab 19)]. We first evaluated statistical significance by conventional KW tests with KEAP1 functional status as the categorical factor in our model (0,1) and either MS-BMR or lifespan data as the response (Minitab 19). Neoaves vs. basal Aves groupings with conventionally significant p values (<0.05) were selected for PI statistical analysis. This was conducted via the computer simulation method77 using a time-calibrated avian phylogeny we constructed by following relationships and node dates from recent fossil-calibrated phylogenomic studies70,78. We constrained branch lengths to these node dates and resolved polytomies through reference to the time tree of life79. Using this phylogeny, we conducted 1000 simulations of continuous character evolution using a model of Brownian motion in MESQUITE80. We used these simulations to generate empirical null distributions of KW H values through manual analysis in Minitab 19, ensuring the simulated characters originated from the same phylogenetic position as the species analyzed within a given KW test. The 95th percentile of the empirical null H value distribution was used as the significance threshold for our KW analyses.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A