Multiple alignments and PAML statistical analysis

DK Dennis Kappei
MS Marion Scheibe
MP Maciej Paszkowski-Rogacz
AB Alina Bluhm
TG Toni Ingolf Gossmann
SD Sabrina Dietz
MD Mario Dejung
HH Holger Herlyn
FB Frank Buchholz
MM Matthias Mann
FB Falk Butter
request Request a Protocol
ask Ask a question
Favorite

DNA and protein sequences of human TERF1 and TERF2 orthologues from up to 24 vertebrate species were obtained from the ENSEMBL database42 (release 75), including all species from the MS screen except axolotl for which there is currently no published genome available. To obtain multiple DNA sequence alignments, the corresponding protein sequences were aligned using MUSCLE43 (version 3.8.31) and files were prepared with PAL2NAL44 (version 14) to set up codon alignments and to remove gaps. Because whole-protein alignments for highly divergent species are difficult to obtain, we restricted the analyses to domain-specific alignments based on the human domain annotation. Here sequences were manually inspected and domains were separately analysed for the homeobox as well as TRFH domains of TERF1 and TERF2. Species for which the respective domain was not fully sequenced were excluded from further analysis. The exact species used for the analysis of the four different domains are depicted in the corresponding figure elements (Fig. 3c, Supplementary Fig. 4, Supplementary Table 1 and 2). Substitution rates were calculated using PAML45 (version 4.7) to obtain the non-synonymous to synonymous substitution rate ratio (dN/dS=ω). ω values <1, =1 and >1 indicate purifying selection, neutral evolution and diversifying (positive) selection, respectively. A branch-site model (model D) was applied and compared with a homogeneous site model (discrete model M3) and to a model D that assumes neutral evolution for a predefined set of branches (for example, for the therian clade). In particular, we used a three-site class model, because we found a highly significant difference when compared with a discrete two-site class model, indicating heterogeneous levels of purifying selection within the protein domains. Significant differences between models were assessed by likelihood-ratio tests, which assume that the 2ΔlnL is approximately χ2 distributed with degrees of freedom being the number of free parameters.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A