We obtained a whole-genome alignment of 42 species (birds and nonavian reptiles) for ratite-accelerated region detection from Sackton et al. (2019; see this study for full details on data collection). Conserved regions in the genome alignment were called by PhastCons using the Phast package (Siepel et al. 2005). A total of 284,001 CNEEs were extracted as DNA regions not overlapping any exons and at least 50 bp in length. Sequence from the extinct moa was subsequently added to CNEE alignments based on a pairwise moa–emu whole-genome alignment (see Sackton et al. 2019 for details). For the mammalian data set, we started with the UCSC 100-way vertebrate alignment (Blanchette et al. 2004; http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/; last accessed March 15, 2019), removed all nonmammalian sequences, and then extracted sequence for 383,185 CNEEs in a fashion similar to that for birds (conserved regions identified by PHAST, each at least 50 bp and not overlapping any exons). The list of species is in supplementary material, Supplementary Material online. We filtered out CNEEs with poor alignment quality in 62 mammal species if the length of alignment gaps was longer than 80% of the whole alignment in more than 50 species, yielding 283,369 candidate CNEEs. For both phylogenies, we obtained branch lengths, parameters in the rate matrix of the nucleotide substitution model (GTR, General Time Reversible) and equilibrium nucleotide frequencies from phyloFit (Sackton et al. 2019 or UCSC, respectively) using background, putatively neutral sequences (in our case 4-fold degenerate sites; Siepel et al. 2005).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.