Sequence data acquisition and alignment
This protocol is extracted from research article:
Amino acid exchangeabilities vary across the tree of life
Sci Adv, Dec 4, 2019; DOI: 10.1126/sciadv.aax3124

Sequence data used were retrieved from various sources listed in table S1. Coding sequence alignments of four mammalian clades, fruitflies, and yeasts were directly retrieved from respective databases. For each of the other eukaryotic clades, we queried in Ensembl (https://useast.ensembl.org/index.html) a list of all one-to-one orthologous genes for the pair of species and downloaded their coding sequences. The coding sequences were translated to protein sequences using Multiple Alignment of Coding Sequences (MACSE) v1.02 (40). Local pairwise protein sequence alignment was performed for each pair of orthologs by Multiple Alignment using Fast Fourier Transform (MAFFT) v7.294b (41) using the L-INS-i algorithm. The corresponding coding sequence alignment was then derived using a custom Python script. All prokaryotic clades were sampled from the strains available in the Alignable Tight Genomic Clusters (ATGC) database (42). All alignments were filtered so that no gaps, missing data, or ambiguous codons exist. The alignments and relevant Python scripts have been deposited to GitHub (https://github.com/ztzou/REvariation).

For the analyses of orthologous versus nonorthologous genes between the rodent clade and the avian clade, we downloaded all coding sequences of mouse, rat, chicken, and turkey from Ensembl 84. In each species, the longest transcript of each gene was retained for subsequent analysis. We then obtained from Ensembl a list of one-to-one orthologs between mouse and rat, a list of one-to-one orthologs between chicken and turkey, and a list of one-to-one orthologs between mouse and chicken. We compared REs respectively estimated from four groups of genes: RO, AO, RN, and AN. RO refers to the group of genes that appear on both the first and third lists. AO refers to the group of genes that appear on both the second and third lists. RN refers to the group of genes that appear on the first list but not on the third list. AN refers to the group of genes that appear on the second list but not on the third list.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.