With the availability of high-resolution scRNA-seq data, one main objective of this study is to explore new ways to generate the reference GEP matrices to be used in bulk tumor deconvolution, i.e., the matrix B as described in the previous section. The ideal B matrix should be able to yield maximal and robust discriminatory power between cell type clusters. Meanwhile, the pooled scRNA-seq data can be served as ground truth for benchmarking the performance of reference GEP as well as deconvolution methods—because the true cell composition in the bulk gene expression data will be known. The similar idea has been implemented in a recent study [9]. The first step of constructing reference GEP matrices is to choose a panel of reference genes that can distinguish the cell populations. In this study, we will focus on four gene panels: (1) LM22 gene reference panel, designed by Newman et al.: it contains 547 genes that distinguish 22 human hematopoietic cell phenotypes including several T-cells types, B cells, and natural killer cells. This panel is the default panel used in CIBERSORT and thus has been used extensively; (2) A panel of signature genes identified from previous literature: it contains 140 genes that are served as signatures for 15 major cell types including HNSCC tumor cells, immune cells, T cell subtypes, and stromal cells (Additional file 3: Table S2). (3) The scRNA-derived marker gene panel discovered through the steps described previously in the method: which contains genes that uniquely expressed in each cell population identified from HNSC scRNA-seq data (Additional file 5: Table S4); (4) A T-cell-specific GEP panel discovered through steps similar to GEP panel (3) but with a focus on four T cell subtypes (Additional file 4: Table S3). Note that we only used the gene list information of these panels. The GEP matrix of these genes is formed through averaging all single cells assigned to these populations. In order to assess the prediction performance of the above four GEP panels, we tested them on in silico bulk tumors by aggregating the single cell transcriptome data. Expression data of individual cells from the same patient in Puram study were pooled to form 15 in-silico tumors, which exhibit varied cellular compositions.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.