To conduct functional enrichment analyses on the top 500 most diverged genes, we tested for GO annotation overrepresentation using WebGestalt with default parameters (Liao et al. 2019) Specifically, we compared the biological process GO terms among the 500 most diverged genes versus all genes with a model in at least one tissue. To confirm that observed trends were not due to the particular threshold we chose, we also conducted a gene set enrichment analysis. Specifically, we ranked genes by the KW P value, choosing the smallest one for genes modeled in multiple tissues, then took the and used WebGestalt’s GSEA implementation to identify the 20 most enriched and depleted biological process GO terms.
We also tested for enrichment of several other gene sets of interest among the top 500 diverged genes: 1) genes whose expression in particular tissues is under stabilizing selection across 17 mammalian species (Chen et al. 2019); 2) genes that are intolerant to LOF variants in their protein products (called if the upper bound of the 95% confidence interval of the observed/expected ratio is lower than 0.35) (Lek et al. 2016); 3) housekeeping genes that show consistent expression across tissues (Eisenberg and Levanon 2013); and 4) a set genes encoding virus interacting proteins (Enard et al. 2016). We calculated an odds ratio for each, and used a Fisher’s exact test to determine significance. For the genes under stabilizing selection on gene expression, we considered only those tested in that study before calculating statistics.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.