R language scripts were written to analyze scRNA-seq data. The counts files were read into R and formatted; averages were obtained for duplicated genes, and transcriptome sequence data of ICC cells and adjacent tissue cells were merged into a matrix. We used the statistical R package “Seurat” to process the data, including data quality control, gene and cell filtration, normalization, variable gene finding, data scaling, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE) algorithms. All default parameters were left unchanged unless otherwise specified. First, the single-cell data were processed by CreateSeuratObject function (arguments: min. cells = 3, min. features = 200) to create the object. Meanwhile, cells with poor quality were excluded. Only genes detected in more than three cells and cells with more than 200 detected genes were used in the following analysis. We conducted quality control using PercentageFeatureSet function (arguments: pattern = MT-), which could calculate gene number, gene types number, and percentage of mitochondrial genes. The correlation between sequenced genes number and sequenced genes types was calculated with FeatureScatter function. The results were also visualized. Second, to exclude non-cells or cell aggregates, subset function was used to further screen samples with the selective criteria of gene expression types of more than 500, gene expression levels of more than 1,000 and fewer than 20,000, and mitochondrial proportion restricted to <20%. The data were log-normalized with NormalizeData function, and the top 1,500 variable genes were identified using the FindVariableFeatures function (arguments: selection.method = vst, nfeatures = 1,500) for subsequent analysis. Third, we used the ScaleData function (vars.to.regress = percent.mt) to mitigate this source of variation in the dataset. PCA was performed by RunPCA function for dimension reduction. After calculation with the JackStraw function, the JackStrawPlot (dims = 1:20) and ElbowPlot functions (ndims = 40) were used to identify the number of significant principal components (PCs) to use for clustering. Through plot visualization, the top 20 PCs were selected for the next analysis. Lastly, cell populations were clustered by t-SNE algorithm. FindClusters function with resolution of 0.5 was performed, and RunTSNE function was used to generate clusters. The FindAllMarkers function (arguments: min.pct = 0.25, logfc.threshold = 0.25) was used to find markers by comparing each cluster with all others; different genes between two identities were identified using the FindMarkers function. The feature plot and heatmap visualization of gene expression were generated using the Seurat function FeaturePlot and DoHeatmap, respectively. Cell type–specific marker genes were taken from published literature (Zhang et al., 2020) and were compared with our analysis results to define the cluster type. Clusters consisting of immune cells were extracted and processed again in the same way as above, and each immune cell type was further divided into subclusters. Marker genes of each immune cell type were identified by comparing ICC subclusters with normal subclusters, and adjustment of P-value (adjPval) <0.05 was regarded as the cutoff criteria. The marker genes of each immune cell type were incorporated as DEGs.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.