request Request a Protocol
ask Ask a question
Favorite

We utilized the Barnes-Hut t-SNE[4] algorithm to project the tissue samples in a 2D-map (t-SNE-map) using the gene expression profiles from 16,142 genes (Fig 1A). Barnes-Hut t-SNE non-linearly retains local similarities between samples at the cost of retaining the similarities between dissimilar samples. This is in contrast to methods such as PCA and MDS that use the same linear mapping to all data. As a result, t-SNE better preserves local (dis)similarities as they are not condensed due to the large dissimilarities in the data set. t-SNE learns this embedding by minimizing the Kullback–Leibler divergence between the probability distribution of the similarities between samples in the high dimensional space and the distribution of the similarities between samples in the 2D map, with respect to the positions of the samples in the 2D map (similarities are measured using Euclidean distances). The similarity of a sample to all other samples in the high dimensional space is modeled as Gaussian with the number of neighbors taken into consideration as a parameter (perplexity). For the low dimensional space this similarity is modelled as a Student t-distribution. The heavy-tail in the t-distribution ensures that distant samples do not condense the map, and as such the local similarities are better preserved. We ran Barnes-Hut t-SNE 1000 times and selected the solution with the lowest KL divergence. Note that Barnes-Hut t-SNE is an optimized version of t-SNE that can handle many samples with many features.

(A) Projection of 1641 GTEx samples in a 2D-map: Each point represents a sample which is coloured according to the (sub)tissue label (45 in total). Samples that cluster outside the matching tissue subtype-cluster are circled in red. Sample clusters are illustrated by the 22 differentially coloured density maps. (B) the projection of 313 brain samples and their associated brain-tissue. (C) Comparison of brain samples clustered by the HC approach versus the t-SNE approach resulted in cophenetic correlation[6] of 0.68. An edge links the sample ID positions between the HC and t-SNE clustering. Edge colours are based on the brain tissue regions. Clusters are labelled if a particular brain region was significantly overrepresented in the cluster (hypergeometric test with P≤0.001).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A