Gene-level essentiality predictions

BB Barbara Bosch
MD Michael A. DeJesus
NP Nicholas C. Poulton
WZ Wenzhu Zhang
CE Curtis A. Engelhart
AZ Anisha Zaveri
SL Sophie Lavalette
NR Nadine Ruecker
CT Carolina Trujillo
JW Joshua B. Wallach
SL Shuqi Li
SE Sabine Ehrt
BC Brian T. Chait
DS Dirk Schnappinger
JR Jeremy M. Rock
request Request a Protocol
ask Ask a question
Favorite

CRISPRi gene essentiality predictions were made using a modified version of the resampling approach previously utilized for TnSeq gene essentiality predictions in M. tuberculosis (DeJesus et al., 2017). Briefly, read-counts at 24.3 generations were compared ± ATc. Read-counts were normalized in two steps as described in Quantification of sgRNA depletion; first to account for sequencing depth (using TTR), and then to make use of the control sgRNAs. For each gene, normalized counts were permuted across the +ATc and –ATc conditions at 24.3 generations for a total of 20,000 iterations. While permutation tests typically look for differences in mean counts, the presence of sgRNAs of different strengths can disproportionately affect the mean of a given gene (e.g., a gene targeted with many weak and few strong sgRNAs). This made differences at lower percentiles the more relevant test-statistic, as it would be more sensitive to the presence of just a few strong guides. Thus, at each iteration, i, the difference in the 20th percentile between the counts was estimated:

where P20%is the percentile function, CnormA,g represents normalized counts for gene g at a given conditionA. The 20,000 instances of the test-statistic estimated after all iterations represented the distribution of the test-statistic under the null-hypothesis. A p-value was estimated by comparing the observed value of the test-statistic to the distribution of the null-hypothesis. p-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). A p-value threshold of padj<0.01was used to assess statistical significance.

For each gene, a summary L2FC was estimated to assess the biological significance of the effect size. L2FC was summarized as the median value of the strongest 10 sgRNAs (i.e., sgRNAs with the smallest L2FC). The optimal threshold for the L2FC cutoff was determined by optimizing the F1-score of the CRISPRi essentiality predictions obtained by varying L2FC thresholds and comparing these against the TnSeq predictions of essentiality. The optimal threshold was estimated at L2FC <5.1 at 24.3 generations. Genes exceeding both thresholds (i.e., L2FC <5.1 and padj<0.01) were called as CRISPRi essential genes by our methodology.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A