Cell Clustering and cluster identification

Michael L Wallace

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Preprint

Cell Clustering and cluster identification

MW Michael L Wallace

Last updated date: Mar 18, 2021 Views: 968 Forks: 0

An abbreviated version of this protocol was published in eLife in Feb, 2020

Anatomical and single-cell transcriptional profiling of the murine habenular complex

Download PDF

Ask a question

How to cite

Favorite

Initial clustering was performed on the dataset using the first 20 PCs, and t-SNE was used only for data visualization. Clustering was run using the SNN-based FindClusters function using the SLM algorithm and 10 iterations. Clustering was performed at varying resolution values, and we chose a final value of 1.2 for the resolution parameter for this stage of clustering. Clusters were assigned preliminary identities based on expression of combinations of known enriched genes for major cell classes and types. The full list of enriched genes is provided in Supplementary File 2 and average expression of all genes in all clusters is provided in Supplementary File 1. Low quality cells were identified based on a combination of low gene/UMIFM counts and high levels of mitochondrial and nuclear transcripts (e.g. Malat1, Meg3, Kcnq1ot1) typically clustered together and were removed. Following assignment of preliminary identities, cells were divided into data subsets as separate Seurat objects (LHb neurons and MHb neurons) for further subclustering. The expression matrix for each data subset was further filtered to include only genes expressed by the cells in the subset (minimum cell threshold of 0.5% of cells in the subset). Subclustering was performed iteratively on each data subset to resolve additional cell types and subtypes. Briefly, clustering was run at high resolution, and the resulting clusters were ordered in a cluster dendrogram using the BuildClusterTree function in Seurat which uses cluster averaged PCs for calculating a PC distance matrix. Putative doublets/multiplets were identified based on expression of known enriched genes for different cell types not in the cell subset (e.g. neuronal and glial specific genes). Putative doublets tended to separate from other cells and cluster together, and these clusters were removed from the dataset. Cluster separation was evaluated using the AssessNodes function and inspection of differentially expressed genes at each node. Clusters with poor separation, based differential expression of mostly housekeeping genes, or activity dependent genes (see Figure 2–figure supplement 1) were merged to avoid over-separation of the data. The dendrogram was reconstructed after merging or removal of clusters, and the process of inspecting and merging or removing clusters was repeated until all resulting clusters could be distinguished based on a set of differentially expressed genes that we could validate separately. To calculate the “ADG Score” (Figure 2–figure supplement 1) we used the AddModuleScore function in Seurat using a list of ADGs that were highly expressed in some of the MHb clusters (Fos, Fosb, Egr1, Junb, Nr4a1, Dusp18, Jun, Jund).

The Code related to the above clustering methods can be found here: https://github.com/mwall2017/habenula_indrops/tree/master/indrops

The metadata showing the cluster designation for each cell is here: https://github.com/mwall2017/habenula_indrops

How to cite：

Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:

Wallace, M(2021). Cell Clustering and cluster identification. Bio-protocol Preprint. bio-protocol.org/prep943.
Wallace, M. L., Huang, K. W., Hochbaum, D., Hyun, M., Radeljic, G. and Sabatini, B. L.(2020). Anatomical and single-cell transcriptional profiling of the murine habenular complex. eLife. DOI: 10.7554/eLife.51271