Using the finalized gene list from the prioritization pipeline, GO and KEGG pathway enrichment analyses were performed using the “goana” and “kegga” functions from the R package limma (Smyth et al. 2021), treating all known genes as the background universe (Young et al. 2010). Only one gene per locus was used for “goana” and “kegga” gene set enrichment analysis, prioritizing genes assigned to primary independent hits. If there were multiple assigned genes, one gene was randomly selected to avoid biasing results through loci with multiple functionally related genes. To identify an appropriate p value cutoff, 100 genes were randomly selected from the genome and run through the same enrichment analysis. This permutation was repeated 1000 times to generate a null distribution of the smallest p values from each permutation. For cluster-specific gene set enrichment analyses, permutation testing used the same number of random genes as the number of genes in each cluster. To ensure the robustness of results, gene set enrichment analysis was repeated 50 times with random selection of genes at loci with multiple assigned genes. GO and KEGG terms that passed permutation cutoffs at least 40/50 times were retained.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.