Although we attempted to collect genomic data for species closely related to the YHSC, a lack of appropriate data meant that we were limited to four echinoderm genomes: A. japonicus, S. purpuratus, and two geographically separated populations of A. planci (Okinawa, Japan and the Great Barrier Reef, Australia). We also included two additional invertebrate genomes: D. melanogaster and S. kowalevskii (sequence sources given in the key resources table).
Functional domains in the protein sequences from the seven genomes were annotated using pfam (version 31.0)114 with default parameters, which uses the HMM scanning method to classify gene families. We compared domain numbers across the species analyzed. We considered domains with z scores >1.96 and more than 5 members in the YHSC genome to be expanded ones, following Wang et al.16 To identify expanded genes in expanded domains, protein sequences with the same domain of interest were aligned using mafft (version 7.453),112 and the resulting alignment was used for phylogenetic tree construction by FastTree (2.1.11)115 with default parameters. Genes with a z scores >1.96 and no less than 4 copies in YHSC genome were regarded as expanded ones.
Additionally, CAFE (version 4.2)34,35,140 was used to evaluate the expansion and contraction of gene families in the YHSC genome compared with the other six genomes. CAFE infers the most likely the size of gene families at all internal nodes and identifies gene families with an accelerated rate of gain or loss using the size of gene families and an ultrametric tree as inputs.34 This analysis had four main steps. First, gene numbers in gene families across the seven genomes were calculated, and a species tree was constructed using OrthoFinder (version 2.3.8)116 with a “-m msa” parameter. Gene families with ≥200 members in any single species were excluded from further analysis.16 Second, an ultrametric tree was constructed using MCMCTree in the PAML package (version 4.9)117, and three soft calibration bounds were set based on timetree141: A. planci-A. japonicus: 450–605 Ma; S. purpuratus-S. kowalevskii: 535–763 Ma; and A. japonicus-D. melanogaster: 643–850 Ma. Third, CAFE (version 4.2)34,35 was used to identify gene families with accelerated rates of gain and loss. Gene families for which the p value of the YHSC branch was <0.01 were defined as significantly expanded or contracted, following a previous study.142 Fourth, the numbers of gene families that had undergone expansions or contractions were plotted on the phylogenetic tree following a previous study.34
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.