2.3 Generating null distributions

Henry Cousins; Taryn Hall; Yinglong Guo; Luke Tso; Kathy T H Tzeng; Le Cong; Russ B Altman

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3 Generating null distributions

HC Henry Cousins

TH Taryn Hall

YG Yinglong Guo

LT Luke Tso

KT Kathy T H Tzeng

LC Le Cong

RA Russ B Altman

This method is extracted from research article: Bioinformatics, Nov 2022

Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19

DOI: 10.1093/bioinformatics/btac735

Request a Protocol

Ask a question

Favorite

The generation of a null distribution of ESs for a given gene set is important both for assigning relative rankings to gene sets of different sizes and for assigning significance levels. In the GSEA algorithm for pre-ranked gene lists, null distributions are generated by sampling a random gene set G_k′ containing the same number of members in the ranked list as the original set G_k and recalculating ES. This implicitly defines a null hypothesis of no association between genes, which, for large gene sets, can result in highly sensitive estimates of significance at the expense of specificity. Therefore, by default, GSPA generates null distributions by first resampling the original gene set to create G_k′, then creating a null set of proximal genes P_k′ as in the original ES calculation for GSPA (Fig. 1B). A null ES is defined from P_k′, and this procedure is repeated a fixed number of times (100 by default). Alternatively, users can test a less stringent null hypothesis by directly resampling P_k itself. Both methods constitute hybrid null hypotheses (i.e. that relative gene expression patterns do not differ between the gene set and background genes) that reduce precisely to the original GSEA prerank algorithm as r decreases to zero, but the former method directly accounts for known correlations between genes (Maleki et al., 2020). Once the ES and null ES distribution have been calculated, normalized ES (NES; a normalized transformation of ES accounting for gene set size), P-value and false discovery rate (FDR) are calculated as in GSEA (Subramanian et al., 2005).

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol