Processing of scRNA-Seq Data

Miao Su; Kuang-Yuan Qiao; Xiao-Li Xie; Xin-Ying Zhu; Fu-Lai Gao; Chang-Juan Li; Dong-Qiang Zhao

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Processing of scRNA-Seq Data

MS Miao Su

KQ Kuang-Yuan Qiao

XX Xiao-Li Xie

XZ Xin-Ying Zhu

FG Fu-Lai Gao

CL Chang-Juan Li

DZ Dong-Qiang Zhao

This method is extracted from research article: Front Genet, Feb 2021

Development of a Prognostic Signature Based on Single-Cell RNA Sequencing Data of Immune Cells in Intrahepatic Cholangiocarcinoma

DOI: 10.3389/fgene.2020.615680

Request a Protocol

Ask a question

Favorite

R language scripts were written to analyze scRNA-seq data. The counts files were read into R and formatted; averages were obtained for duplicated genes, and transcriptome sequence data of ICC cells and adjacent tissue cells were merged into a matrix. We used the statistical R package “Seurat” to process the data, including data quality control, gene and cell filtration, normalization, variable gene finding, data scaling, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE) algorithms. All default parameters were left unchanged unless otherwise specified. First, the single-cell data were processed by CreateSeuratObject function (arguments: min. cells = 3, min. features = 200) to create the object. Meanwhile, cells with poor quality were excluded. Only genes detected in more than three cells and cells with more than 200 detected genes were used in the following analysis. We conducted quality control using PercentageFeatureSet function (arguments: pattern = ^∧MT-), which could calculate gene number, gene types number, and percentage of mitochondrial genes. The correlation between sequenced genes number and sequenced genes types was calculated with FeatureScatter function. The results were also visualized. Second, to exclude non-cells or cell aggregates, subset function was used to further screen samples with the selective criteria of gene expression types of more than 500, gene expression levels of more than 1,000 and fewer than 20,000, and mitochondrial proportion restricted to <20%. The data were log-normalized with NormalizeData function, and the top 1,500 variable genes were identified using the FindVariableFeatures function (arguments: selection.method = vst, nfeatures = 1,500) for subsequent analysis. Third, we used the ScaleData function (vars.to.regress = percent.mt) to mitigate this source of variation in the dataset. PCA was performed by RunPCA function for dimension reduction. After calculation with the JackStraw function, the JackStrawPlot (dims = 1:20) and ElbowPlot functions (ndims = 40) were used to identify the number of significant principal components (PCs) to use for clustering. Through plot visualization, the top 20 PCs were selected for the next analysis. Lastly, cell populations were clustered by t-SNE algorithm. FindClusters function with resolution of 0.5 was performed, and RunTSNE function was used to generate clusters. The FindAllMarkers function (arguments: min.pct = 0.25, logfc.threshold = 0.25) was used to find markers by comparing each cluster with all others; different genes between two identities were identified using the FindMarkers function. The feature plot and heatmap visualization of gene expression were generated using the Seurat function FeaturePlot and DoHeatmap, respectively. Cell type–specific marker genes were taken from published literature (Zhang et al., 2020) and were compared with our analysis results to define the cluster type. Clusters consisting of immune cells were extracted and processed again in the same way as above, and each immune cell type was further divided into subclusters. Marker genes of each immune cell type were identified by comparing ICC subclusters with normal subclusters, and adjustment of P-value (adjPval) <0.05 was regarded as the cutoff criteria. The marker genes of each immune cell type were incorporated as DEGs.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol