Single cell RNA-seq secondary analyses

EV Ellen K. Velte
BN Bryan A. Niedenberger
NS Nicholas D. Serra
AS Anukriti Singh
LR Lorena Roa-DeLaCruz
BH Brian P. Hermann
CG Christopher B. Geyer
request Request a Protocol
ask Ask a question
Favorite

Raw count single cell gene expression matrices were imported into the R package Seurat v1.4.03 (Satija et al., 2015), filtered [cells expressing ≥200 detected genes and ≥2500 unique molecular identifiers (UMIs), genes expressed in ≥3 cells, and cells with <10% transcripts corresponding to mitochondrial genes] and gene expression values were log normalized and scaled. Unsupervised cell clustering and dimensionality reduction with t-SNE were performed in Seurat based on the statistically significant principal components. The top-ten differentially expressed genes (marker genes) of each cell cluster were determined by log fold change ≥0.25 using a default Wilcox test. After identification of cell clusters, raw count matrices without cluster subsets (e.g. without testicular somatic cells) or sample subsets (e.g. RA only) were re-analyzed with Seurat and subsequently imported to Monocle2 (Qiu et al., 2017; Trapnell et al., 2014) for additional combined t-SNE, unsupervised density peaks clustering and trajectory analyses (genes expressed in ≥10 cells, minimum expression of 0.1). The top 1000 differentially expressed genes (DEGs; q value <0.01) over pseudotime were used to perform trajectory analysis, which ordered cells in pseudotime by cell state. A pseudotime heatmap was plotted using these DEGs using the ward.D2 method (Hermann et al., 2018). Lists of differentially expressed genes were analyzed by Ingenuity Pathway Analysis [Qiagen, Build 481437M, content version 44691306 (6/2018)] to identify biological pathways that were significantly over-represented among the genes in each list.

The P1.5 RA-treated data sets were merged with previously published single cell transcriptomes from sorted P6 ID4-EGFPbright and ID4-EGFPdim spermatogonia (Hermann et al., 2018), comprising a total of 31,830 cells; data were analyzed in Seurat with the same QC and scaling as noted earlier. Linear dimensional reduction was performed using the RunPCA function based on 2195 variable genes, principal components (PC) 1-10, and a resolution of 0.6. This resulted in 19 clusters, among which clusters 0, 3, 5 and 16 were identified as germ cell clusters, using expression of germ cell markers (Ddx4 and Gata4) visualized as violin plots. This resulted in a log-normalized gene expression matrix of 9922 germ cells and 21,324 genes for subsequent analysis. Following preprocessing by Seurat, the germ cell clusters were analyzed in Pagoda2 (Fan et al., 2016) using the default QC function. Gene variance normalization was performed, followed by dimensional reduction by PC analysis (PCA; nPcs=50). Subsequently, a k-nearest neighbor (KNN) graph identified 50 cell clusters in PCA space and embeddings were then generated using largeVis based on PCA reduction. Differential gene expression of each cluster (≥log2 fold change) and pathway overdispersion were determined to generate an unsupervised hierarchical clustering dendrogram. Differentially expressed gene lists between clusters 30 and 26 were determined within the PAGODA2 web app using a Mann–Whitney U test and were used for gene set enrichment analysis in Metascape (http://metascape.org/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A