Data visualization, clustering and diffusion maps

AI Ane Iturbide
MS Mayra L. Ruiz Tejeda Segura
CN Camille Noll
KS Kenji Schorpp
IR Ina Rothenaigner
ER Elias R. Ruiz-Morales
GL Gabriele Lubatti
AA Ahmed Agami
KH Kamyar Hadian
AS Antonio Scialdone
MT Maria-Elena Torres-Padilla
request Request a Protocol
ask Ask a question
Favorite

We used UMAP60 for data visualization (‘umap’ function in scanpy, with options n_components=2, min_dist=1). Leiden clustering was performed on the top 3,000 HVGs calculated across the whole dataset (with k = 15 and resolution = 0.4) using a correlation distance in the ‘pp.neighbors’ function from scanpy. To identify marker genes for a given cluster, first we found differentially expressed genes between that cluster and any other cluster (Wilcoxon’s rank sum test, false discovery rate (FDR) < 0.1, log2FC > 1), then genes were ranked according to their mean FDRs computed across all pairwise comparisons. To validate the differentiation state of the clusters suggested by the markers, the expression of some previously known relevant genes (Rex1, Sox2, Nanog, Tcstv1, Zscan4a, Zscan4c, Zscan4d, Zscan4e, Gata6, Meis1, Sox17 and Sox7) was plotted on UMAP. Cells were aligned along a pseudotime trajectory using a diffusion map61, which was computed with the ‘diffmap’ function from the scanpy package on the first 20 principal components. We performed all differential gene expression analyses with Wilcoxon’s rank sum test, with an FDR threshold of 0.1 and log2FC threshold of 1.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A