scRNA-seq data processing

PZ Peng Zhao
QY Qingzhou Yao
PZ Pei-Jian Zhang
ET Erlinda The
YZ Yufeng Zhai
LA Lihua Ao
MJ Michael J. Jarrett
CD Charles A. Dinarello
DF David A. Fullerton
ask Ask a question
Favorite

scRNA-seq raw data were first processed through the 10x Genomics Cell Ranger pipeline (v3.1) and then analyzed in R (v3.6) with the Seurat package (http://satijalab.org/seurat/). Low-quality cells (<400 genes per cell and >25% mitochondrial transcript presence per cell) were excluded from analysis. After quality control, 4749 cells (2433 for control and 2316 for OSS) were included in subsequent analysis. Gene expression was log-normalized to a scale factor of 10,000 and then regressed on the number of molecules detected per cell. Highly variable genes were selected and used for principal components analysis. Cells were projected in 2D space using UMAP with default parameters. Using graph-based clustering function, 10 principal components were used in cell clusters with the resolution parameter set at 0.2, resulting in four clusters. The top 10 genes with the highest dispersion were used to construct the heatmap for each cluster.

The Broad Institute GSEA software (https://gsea-msigdb.org/gsea/) was used to run analyses on normalized gene expression data. We used the hallmark gene set database, which includes 50 MSigDB gene sets. In particular, we followed the standard procedure as described by GSEA user guide (http://broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html). The FDR for GSEA is the estimated probability that a gene set with a given normalized enrichment score represents a false-positive finding, and an FDR < 0.25 is considered to be statistically significant for GSEA.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A