Transcriptome signature analysis

PD Paul Datlinger
AR André F Rendeiro
CS Christian Schmidl
TK Thomas Krausgruber
PT Peter Traxler
JK Johanna Klughammer
LS Linda C Schuster
AK Amelie Kuchler
DA Donat Alpar
CB Christoph Bock
request Request a Protocol
ask Ask a question
Favorite

We first investigated the concordance between gRNAs across the whole transcriptome by measuring pairwise distances (L2-norm metric) between gRNAs. We used the median expression of single cells with the same gRNA assigned and assessed the significance of differences between the distributions of gRNAs targeting the same gene and gRNAs targeting different genes or between data types using the Mann-Whitney U test. Significant perturbations by a gRNA were assessed by grouping cell transcriptomes in the same manner (median) and comparing each gRNA with the respective stimulation condition, assessing significance using the Mann-Whitney U test. For gene-level quantifications, we combined the gRNA P values obtained from the previous step with Fisher’s method.

Sparse transcriptome signatures were used to position single cells sharing a putative perturbation (here, gene knockouts) within a multidimensional trajectory of cell states (here, measured activation of the TCR pathway in Jurkat cells). This method is reminiscent of a method for the assessment of cross-contamination between cell types in single cells23 or a recently described approach for monitoring cell fate conversion24. We identified genes associated with the TCR response (signature genes), selected cells assigned to a nontargeting control gRNA as representatives of the unperturbed stimulation (control cells), produced in silico mixtures of gene expression profiles of increasingly more stimulated cells based on the control cells from both conditions (mix profiles), and identified the mixed profile that best matched the expression profile of each single cell (signature position).

To determine a TCR-specific activation signature, we aggregated single cells by their assigned gRNA target genes and performed principal component analysis on all genes. This analysis detected a clear separation of genes by the TCR-activating anti-CD3/CD28 stimulation condition in principal component 1 (Supplementary Fig. 6c). Signature genes were defined as having an absolute loading higher than the 99th percentile for this principal component. Bioinformatic analysis of the signature gene function was performed with the gene set enrichment analysis tool Enrichr25, and the retrieved combined score (log[P value] * z-score) was displayed.

Based on the median expression level of cells assigned to nontargeting control gRNAs, we constructed for the signature genes a matrix Z of stepwise weighted linear mixtures of expression profiles between the unstimulated and stimulated conditions, which is given by:

where μa and μb are vectors with mean expression values of cells in the unstimulated and stimulated conditions, respectively; and i and j are indices of the signature genes (n) and the number of desired mixtures m. We selected m = 100; and to account for cell-to-cell variability as well as overshooting changes in the signature genes compared with the mean of the control cells, we generated an extended linear space of length m with boundaries -20 and 120. For m = 100, this is equivalent to μa and μb being placed at index j = 20 and j = 80 of the Z matrix, respectively.

We positioned each single cell in the matrix by retrieving the argmax of the Pearson correlation of each cell to each mixture in the matrix and visualized the relationship between the correlation argmax and the mean number of unique reads per cell, the minimum and (1 - maximum) correlation to ensure these variables were not confounding the signature positioning. Grouping cells by their gRNA assignment, we visualized the distribution of cell signature positions per group and calculated the mean group signature position for groups with more than ten cells. We then calculated the log fold deviation of each group of cells to the group of control cells within the respective stimulation condition to determine a deviation relative to genetically unperturbed cells.

The above procedure was repeated for each bulk RNA-seq sample on the same set of genes, but with theμa and μb of the signature matrix as the medians of the eight nontargeting control bulk cell lines. Signature values for gRNAs or genes obtained from the CROP-seq data were compared with those derived from bulk RNA-seq if available, and the Pearson and Spearman correlations were calculated. To assess the robustness of signature positions derived from the CROP-seq data, we subsampled 100 fractions of the data set and compared the Pearson correlation of the gRNA or gene aggregated signatures with those provided by the bulk RNA-seq as reference at each fraction for 100 iterations with random sampling.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A