Single-cell hierarchical Poisson factorization (scHPF) analysis

WZ Wenting Zhao
AD Athanassios Dovas
ES Eleonora Francesca Spinazzi
HL Hanna Mendes Levitin
MB Matei Alexandru Banu
PU Pavan Upadhyayula
TS Tejaswi Sudhakar
TM Tamara Marie
MO Marc L. Otten
MS Michael B. Sisti
JB Jeffrey N. Bruce
PC Peter Canoll
PS Peter A. Sims
request Request a Protocol
ask Ask a question
Favorite

For the scHPF model in Fig. Fig.4,4, we combined scRNA-seq profiles from one vehicle-treated and one etoposide-treated slice from PW029; two vehicle-treated, one etoposide-treated, and one Panobinostat-treated slice from PW030, PW032, PW034, and PW036; and two vehicle-treated and one Panobinostat-treated slice from PW040 for a total of 21 samples (see Additional file 1: Table S1). To avoid dominant factors from any one sample, we randomly sub-sampled the scRNA-seq profiles such that each of the 21 samples contributed 803 cells to the model for a total of 16,863 cells. We then factorized the resulting merged count matrix using scHPF with default parameters and K = 17 (www.github.com/simslab/scHPF) [16, 23]. For all downstream analysis of the model, we removed two nuisance factors. The first was correlated with coverage and highly ranked housekeeping genes and ribosomal protein-encoding genes, and the second contained highly ranked genes associated with cell stress and heat shock, likely a result of dissociation artifacts in a subset of cells and samples (Additional file 1: Fig. S7a). This resulted in a scHPF model with 15 factors (Additional file 2: Table S4).

a UMAP embedding of scRNA-seq profiles from slice cultures of six patients generated using the cell score matrix from joint scHPF analysis of the entire data set colored by patient. b Same as a but colored by treatment condition. c Same as a but colored by the scHPF-imputed log-ratio of Chr. 7 to Chr. 10 average expression where a high ratio (red) indicates malignant transformation. d Same as a but colored by expression of the oligodendrocyte marker PLP1. e Same as a but colored by expression of the myeloid marker CD14. f Same as a but colored by the total expression of the T cell receptor constant regions (TRAC, TRBC1, TRBC2). g Heatmap showing the log-ratio of the average expression of the top 100 genes in each eptoposide-treated to each control slice for each scHPF factor and each of three cell types—transformed (tumor), oligodendrocyte (oligo), and myeloid. h Same as g for panobinostat-treated slices. i Violin plots showing the distributions of the average expression of the top 100 genes in the Proliferation scHPF factor for each vehicle- and etoposide-treated slice for each patient in tumor cells. All within-patient, vehicle-treatment comparisons have p<0.05 (Mann-Whitney U-test) unless otherwise indicated (N.S. or not significant). j Same as i for the Panobinostat1/MT scHPF factor for each vehicle- and panobinostat-treated slice in tumor cells. k Same as j for the Panobinostat2/Chemokine scHPF factor in tumor cells. l Same as j for the Panobinostat3/Oligo scHPF factor in oligodendrocytes. m Same as j for the Myeloid2/Pro-Inflammatory scHPF factor in myeloid cells. n Same as j for the Myeloid3/CD163 scHPF factor in myeloid cells

To visualize the scHPF model, we generated a UMAP embedding using a Pearson correlation matrix computed from the cell score matrix. To cluster the scRNA-seq profiles using the Phenograph implementation of Louvain community detection [20], we used the same Pearson correlation matrix and k=50 to construct a k-nearest neighbors graph. We conducted the aneuploidy analysis in Fig. Fig.4c4c from the scHPF model by first computing the cell loading matrix Θ containing elements E[θi,k|x] for each cell-factor pair i,k and the gene sample weight matrix Β containing elements E[βg,k|x] for each gene-factor pair g,k where x is the scRNA-seq count matrix. Next, we computed the diagonal cell scaling matrix Ξ containing elements E[ξi,i|x]*10,000 for each cell i and finally:

where G is the log-transformed scHPF-imputed expectation value matrix for the expression level of each gene in each cell. We colored the UMAP embedding in Fig. Fig.4c4c by the difference in the average value of G for genes in chromosome 7 and that for chromosome 10. We scored each Phenograph cluster by the average of this value and took all cells in clusters with an above-average score to be malignantly transformed.

The fold-change values in the heatmaps in Fig. Fig.4g,4g, h were computed by dividing the average expression of the top 100 genes in each factor (rows) for the treated slice by that of each vehicle-treated slice (columns) and log-transforming. For select factors, the distribution of average expression of the top 100 genes across cells is shown for the tumor cells, oligodendrocytes, or myeloid cells for each slice in Fig. Fig.44i–n.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A