We identified genes and gene sets responsible for phenotypic differences across squamate skin with two complementary methods: differential expression and gene set enrichment analyses.
To identify differentially expressed genes, we implemented three differential expression pipelines across an unpaired and a paired experimental design (see Supplementary material online for details). In the unpaired design, we used a single sample from each specimen; in the paired design, we used pairs of samples from each specimen (i.e., each specimen contributed with one sample from each skin color). To statistically control for idiosyncratic expression patterns shared by paired samples, we used specimen identification as a fixed factor when fitting the generalized linear models for differential expression in paired designs.
Given that the list of differentially expressed genes from distinct pipelines commonly shows only partial overlap, we chose to combine information from the three pipelines to identify a gene as differentially expressed. This partial overlap is due to peculiarities of each pipeline such as the read alignment software, the read count normalization, and the variance shrinkage approach implemented in the differential expression pipeline (Zhang et al. 2014; Costa-Silva et al. 2017). After performing preliminary comparisons across multiple software combinations, we restricted our analyses to three differential expression pipelines consisting of two read count and three differential expression software: salmon + DESeq2, salmon + edgeR and kallisto + sleuth (Robinson and Oshlack 2010; Love et al. 2014; Bray et al. 2016; Patro et al. 2017; Pimentel et al. 2017). Prior to running differential expression analyses, we converted expected transcript-wise read counts into expected gene-wise read counts using the R package tximport (Soneson et al. 2016). We ran kallisto, sleuth, edgeR, DESeq2, and tximport in R v.3.6.3 (R Core Team 2013).
Once we obtained gene-wise P-values for each pipeline within each experimental design, we used Fisher’s Combined Test (Fisher 1934) to identify which genes showed a consistent pattern of differential expression across all three pipelines. We considered a gene as candidate if it had a log2 fold change ≥ 1 (i.e., at least a 2-fold difference in expression) and a Fisher’s combined test false discovery rate ≤ 0.05. We implemented Fisher’s Combined Test with the “fisher, method” function from the R package “metaseqR” (Moulos 2020).
Lastly, we identified a differentially expressed gene as a candidate gene if it (fig. 4a): i) has been functionally linked to color or color pattern in other vertebrate taxa; ii) was consistently differentially expressed across more than one skin comparison (e.g., upregulated in orange skin relative to both yellow and white skin); or iii) showed consistent log-fold changes in expression across experimental designs for a given skin comparison. We tested for (iii) by estimating the correlation between log-fold changes across paired and unpaired experimental designs using R package “Rmisc” (Harrell 2020).
We focused our gene set enrichment analyses on gene sets hypothesized to be responsible for color and color pattern differences across yellow, orange, and white skin. White colors are hypothesized to be the product of a coherent scattering of light by guanine-platelets deposited in iridophores, while yellow and orange colors are hypothesized to be reflected by pteridines and/or carotenoids deposited in xanthophores/erythrophore (Bagnara and Hadley 1973). Therefore, following McLean et al. (2017), we tested for the enrichment of pathways associated with the synthesis of guanines, pteridines, and carotenoids (table S5, Supplementary material online). The “guanine synthesis” pathway included the enzymatic precursors for the production of guanine from phosphoribosyl pyrophosphate (Higdon et al. 2013); the “pteridine synthesis” pathway included genes from the tetrahydro-biopterin biosynthesis module, as well as genes responsible for the synthesis of drosopterins and sepiapterins (Ziegler 2003; Braasch et al. 2007); and the “carotenoid synthesis” pathway included genes from the retinol (vitamin A) metabolism (Waagmeester et al. 2009).
We used gene set enrichment analyses to test if genes belonging to each of these three gene sets (Goeman and Bühlmann 2007): i) were disproportionately represented among differentially expressed genes (i.e., over-representation test); ii) were differentially expressed as frequently as genes not in the gene set (i.e., competitive enrichment test); or iii) contained at least one differentially expressed gene (i.e., selfcontained enrichment test). We performed gene set enrichment analyses only for the unpaired experimental design. We implemented gene set enrichment analyses with three software: “enrichKEGG,” from the R package “clusterProfiler” v3.0.4 (over-representation test; Yu et al. 2020), and “fry” (selfcontained enrichment test; Wu et al. 2010), as well as “camera” (competitive enrichment test, Wu and Smyth 2012) from the R package edgeR. Prior to performing the over-representation test, we used KEGG’s Online Blast KEGG Orthology and Links Annotation (blastKOALA; Kanehisa et al. 2016) to align the differentially expressed transcripts with ORFs against the KEGG GENES database.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.