Initial clustering was performed on the dataset using the first 20 PCs, and t-SNE was used only for data visualization. Clustering was run using the SNN-based FindClusters function using the SLM algorithm and 10 iterations. Clustering was performed at varying resolution values, and we chose a final value of 1.2 for the resolution parameter for this stage of clustering. Clusters were assigned preliminary identities based on expression of combinations of known enriched genes for major cell classes and types. The full list of enriched genes is provided in Supplementary File 2 and average expression of all genes in all clusters is provided in Supplementary File 1. Low quality cells were identified based on a combination of low gene/UMIFM counts and high levels of mitochondrial and nuclear transcripts (e.g. Malat1, Meg3, Kcnq1ot1) typically clustered together and were removed. Following assignment of preliminary identities, cells were divided into data subsets as separate Seurat objects (LHb neurons and MHb neurons) for further subclustering. The expression matrix for each data subset was further filtered to include only genes expressed by the cells in the subset (minimum cell threshold of 0.5% of cells in the subset). Subclustering was performed iteratively on each data subset to resolve additional cell types and subtypes. Briefly, clustering was run at high resolution, and the resulting clusters were ordered in a cluster dendrogram using the BuildClusterTree function in Seurat which uses cluster averaged PCs for calculating a PC distance matrix. Putative doublets/multiplets were identified based on expression of known enriched genes for different cell types not in the cell subset (e.g. neuronal and glial specific genes). Putative doublets tended to separate from other cells and cluster together, and these clusters were removed from the dataset. Cluster separation was evaluated using the AssessNodes function and inspection of differentially expressed genes at each node. Clusters with poor separation, based differential expression of mostly housekeeping genes, or activity dependent genes (see Figure 2–figure supplement 1) were merged to avoid over-separation of the data. The dendrogram was reconstructed after merging or removal of clusters, and the process of inspecting and merging or removing clusters was repeated until all resulting clusters could be distinguished based on a set of differentially expressed genes that we could validate separately. To calculate the “ADG Score” (Figure 2–figure supplement 1) we used the AddModuleScore function in Seurat using a list of ADGs that were highly expressed in some of the MHb clusters (Fos, Fosb, Egr1, Junb, Nr4a1, Dusp18, Jun, Jund).
Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
Wallace, M(2021). Cell Clustering and cluster identification. Bio-protocol Preprint. bio-protocol.org/prep943.
Wallace, M. L., Huang, K. W., Hochbaum, D., Hyun, M., Radeljic, G. and Sabatini, B. L.(2020). Anatomical and single-cell transcriptional profiling of the murine habenular complex. eLife. DOI: 10.7554/eLife.51271
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
0/150
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Spinning
Post a Question
0 Q&A
Spinning
This protocol preprint was submitted via the "Request
a Protocol" track.