ScRNA-seq data with a reading depth of 10× genomics (including mRNA expression profiles) of 3200 cells from four HCC samples were obtained from the GSE146115 dataset in the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) (accessed on 15 July 2021) database and were performed to identify differentiation-related genes. Then, the “Seurat” package of R (v3.5.243) was used to process the scRNA-seq data, and the Percentage Feature Set function was used to filter the cells with ≥5% of mitochondria-expressed genes. Next, cells with a gene number <100 and a sequencing number <50 were filtered using correlation analysis, which was conducted to elucidate the relationship between sequencing depth per cell and total intracellular RNA sequences.
After filtering low-quality cells, the scRNA-seq gene expression dates were normalized by the LogNormalize function, and 1500 feature genes with high cell-to-cell variation were identified using the Find Variable Features method. In addition, gene expression profile information and corresponding clinical information of RNA-seq data from 50 paracancerous tissues and 374 HCC samples were extracted from a publicly available genetic database of cancer patients, TCGA (https://tcga-date.nci.nih.gov/tcga) (accessed on 18 July 2021), as a training dataset. Simultaneously, 260 HCC samples were obtained from the ICGC (https://dcc.icgc.org/donor.LIRI-JP.tsv.gz) (accessed on 20 July 2021) dataset as a validation dataset, among which 28 patients did not have detailed clinical data (e.g., survival time = 0 or unknown; absence of pathological diagnosis) was removed from the study.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.