Identification of convergent and divergent sites

The coding sequences for each orthologous gene in our data set was aligned using PRANK (45) and translated into amino acid sequences. On the basis of the species tree (Fig. 1A), we inferred ancestral sequences for each gene using maximum likelihood and empirical Bayesian approaches (30), counting the numbers of convergent and divergent substitutions along the branch pairs in which we were interested (Fig. 1A). To exclude the effects of sequencing errors, incorrect alignments, and nonorthologous regions in the alignments on the identified convergent and divergent sites, we deleted a convergent/divergent site if its flanking sequences ±10 amino acids met one of the following criteria: (i) mean sequence similarity < 0.7; (ii) lowest similarity < 0.35 between any two sequences; and (iii) >5 successive indels in more than two species. We defined the convergent substitutions at a site as those inferred substitutions that resulted in the same amino acid along the branch pairs examined for convergence, thus including both convergent and parallel substitutions. If a gene contained at least one convergent substitution, then it was defined as a convergent gene. To ensure the robustness of our analyses, the inferred sites were respectively included into our analyses with three cutoffs of posterior probabilities: ≥0.95, ≥0.7, and ≥0.5.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.