We selected the following candidate genes involved in the digestion of dietary carbohydrates, proteins, and lipids to identify gene copy number and estimate positive diversifying selection: the carbohydrase amylase; the proteolytic enzymes aminopeptidase a, aminopeptidase b, aminopeptidase Ey, aminopeptidase N, aminopeptidase Ey-like, chymotrypsin A, chymotrypsin B, chymotrypsin-like, and trypsin; the lipolytic enzymes phospholipase B1, group XIIB secretory phospholipase A2-like protein, carboxyl ester lipase and carboxyl ester lipase-like enzyme.
To evaluate amylase gene copies, we used previously published variants of C. violaceus amylase (amy2a and amy2b [28]) deposited on NCBI (KT920438 and KT920439) to search our assembled genome using both mummer v. 3.23 [36] and blast [37]. We then used fascut, a perl script that is part of the FAST Analysis of Sequences Toolbox [38] to trim the contig that contained amylase loci and neighbouring genes. These fragments were viewed with augustus v. 3.2.3 [39]; genomicus v. 96.01 (http://www.dyogen.ens.fr/genomicus/) and ensembl v. 97 were used to visualize syntenic regions of the candidate gene in C. violaceus, as well as Danio rerio, Oryzias latipes and Gasterosteus aculeatus. We then obtained the amylase sequences from multiple stichaeid species representing dietary diversity [22] (electronic supplementary material, figure S2), including: Anoplarchus purpurescens (carnivore), Dictyosoma burgeri (carnivore), Phytichthys chirus (omnivore), Xiphister atropurpureus (omnivore) and X. mucosus (herbivore [22,25]). The C. violaceus amylase sequences and orthologous sequences from the five other prickleback species were aligned in mega v. 7.0.26 [40] with muscle (default parameters with codons [41]). Selection was estimated using branch-site models and using adaptive branch site REL (aBSREL), a branch-site model that infers the optimal number of ω (nonsynonymous/synonymous rate ratio) classes for each branch, testing whether a proportion of sites have evolved under positive selection. Next, a mixed effects model of evolution (MEME) was used to test the individual sites subject to episodic positive or diversifying selection, and a signatures of recombination genetic algorithm for recombination detection (GARD) was used as part of the datamonkey v. 2.0 web application [42].
The assembled C. violaceus transcriptome was used to identify candidate protease genes. Once identified, candidate genes were blasted against our assembled transcriptomes where the highest bit score and per cent identity (greater than 70%) were used to identify the orthologues from assembled transcriptomes from the same stichaeid species used for amylase analyses [11]. All orthologues identified from the stichaeid species were used for the molecular evolution analyses (aBSREL, MEME and GARD), and used for multiple sequence alignments. jmodeltest v. 2.1.0 [43] was used to test for a model of sequence evolution, and phylogenetic trees made using PhyML v. 3.1 [44]. Synteny was analysed as described for amylase.
Similar methods were followed for phospholipase B1, group XIIB secretory phospholipase A2-like protein, carboxyl ester lipase and carboxyl ester lipase-like enzyme using the annotated C. violaceus transcriptome.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.