Gene-level essentiality predictions

Barbara Bosch; Michael A. DeJesus; Nicholas C. Poulton; Wenzhu Zhang; Curtis A. Engelhart; Anisha Zaveri; Sophie Lavalette; Nadine Ruecker; Carolina Trujillo; Joshua B. Wallach; Shuqi Li; Sabine Ehrt; Brian T. Chait; Dirk Schnappinger; Jeremy M. Rock

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Gene-level essentiality predictions

BB Barbara Bosch

MD Michael A. DeJesus

NP Nicholas C. Poulton

WZ Wenzhu Zhang

CE Curtis A. Engelhart

AZ Anisha Zaveri

SL Sophie Lavalette

NR Nadine Ruecker

CT Carolina Trujillo

JW Joshua B. Wallach

SL Shuqi Li

SE Sabine Ehrt

BC Brian T. Chait

DS Dirk Schnappinger

JR Jeremy M. Rock

This method is extracted from research article: Cell, Aug 2021

Genome-wide gene expression tuning reveals diverse vulnerabilities of M. tuberculosis

DOI: 10.1016/j.cell.2021.06.033

Request a Protocol

Ask a question

Favorite

CRISPRi gene essentiality predictions were made using a modified version of the resampling approach previously utilized for TnSeq gene essentiality predictions in M. tuberculosis (DeJesus et al., 2017). Briefly, read-counts at 24.3 generations were compared ± ATc. Read-counts were normalized in two steps as described in Quantification of sgRNA depletion; first to account for sequencing depth (using TTR), and then to make use of the control sgRNAs. For each gene, normalized counts were permuted across the +ATc and –ATc conditions at 24.3 generations for a total of 20,000 iterations. While permutation tests typically look for differences in mean counts, the presence of sgRNAs of different strengths can disproportionately affect the mean of a given gene (e.g., a gene targeted with many weak and few strong sgRNAs). This made differences at lower percentiles the more relevant test-statistic, as it would be more sensitive to the presence of just a few strong guides. Thus, at each iteration, $i$ , the difference in the 20th percentile between the counts was estimated:

where $P^{20 %}$ is the percentile function, $C_{n o r m}^{A, g}$ represents normalized counts for gene $g$ at a given condition $A$ . The 20,000 instances of the test-statistic estimated after all iterations represented the distribution of the test-statistic under the null-hypothesis. A p-value was estimated by comparing the observed value of the test-statistic to the distribution of the null-hypothesis. p-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). A p-value threshold of $p_{a d j} < 0.01$ was used to assess statistical significance.

For each gene, a summary L2FC was estimated to assess the biological significance of the effect size. L2FC was summarized as the median value of the strongest 10 sgRNAs (i.e., sgRNAs with the smallest L2FC). The optimal threshold for the L2FC cutoff was determined by optimizing the F1-score of the CRISPRi essentiality predictions obtained by varying L2FC thresholds and comparing these against the TnSeq predictions of essentiality. The optimal threshold was estimated at L2FC $< - 5.1$ at 24.3 generations. Genes exceeding both thresholds (i.e., L2FC $< - 5.1$ and $p_{a d j} < 0.01$ ) were called as CRISPRi essential genes by our methodology.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol