4.7. Visualization

Uthsav Chitra; Brian J. Arnold; Hirak Sarkar; Cong Ma; Sereno Lopez-Darwin; Kohei Sanno; Benjamin J. Raphael

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

4.7. Visualization

UC Uthsav Chitra

BA Brian J. Arnold

HS Hirak Sarkar

CM Cong Ma

SL Sereno Lopez-Darwin

KS Kohei Sanno

BR Benjamin J. Raphael

This method is extracted from research article: bioRxiv, Oct 2023

Mapping the topography of spatial gene expression with interpretable deep learning

DOI: 10.1101/2023.10.10.561757

Ask a question

Favorite

The neural network in GASTON learns an isodepth $d (x, y)$ that smoothly varies across a tissue slice $T$ ; however, the scaling of the learned isodepth $d (x, y)$ is arbitrary. To improve the interpretability of the isodepth $d (x, y)$ learned by the neural network, we scale the isodepth in each spatial domain to reflect approximate physical distances inside the domain. Briefly, we derive an estimate $γ_{p}$ of the “average width” of each spatial domain $R_{p}$ in μm, and we linearly transform the isodepth $d (x, y)$ in each spatial domain such that the range of isodepth values in domain $R_{P}$ is $γ_{p}$ .

We scale the isodepth in each spatial domain as follows. Given the isodepth $d (x, y)$ , spatial domains $R_{1}, \dots, R_{P}$ , and breakpoints $b_{1}, \dots, b_{P - 1}$ estimated from (10) and (11), we assume without loss of generality that the isodepth is linearly transformed such that ${m i n}_{(x, y) \in T} d (x, y) = 0$ and ${m a x}_{(x, y) \in T} d (x, y) = 1$ , i.e. the breakpoints satisfy $b_{0} = 0 < b_{1} < \dots < b_{P - 1} < 1 = b_{P}$ , where we set $b_{0} = 0$ and $b_{P} = 1$ for convenience. For each spatial domain $R_{p}$ , let $γ_{p}$ be the average width of the domain, whose computation we describe below. We compute the “scaled” isodepth $\tilde{d} (x, y)$ as

where $e_{p}, f_{p}$ are chosen such that $\tilde{d} (x, y)$ is continuous, and $\tilde{d} (x, y) = \sum_{q = 1}^{p} γ_{q}$ if $d (x, y) = b_{p}$ for $p = 1, \dots, P$ . With this choice of $e_{p}, f_{p}$ , the range of scaled isodepth values $\tilde{d} (x, y)$ in a spatial domain $R_{p}$ is given by

That is, the range of isodepth values $\tilde{d} (x, y)$ in each spatial domain is the average width $γ_{p}$ of the domain $R_{p}$ .

We estimate the average width $γ_{p}$ of each spatial domain $R_{p}$ by computing the median physical distance between the two boundaries of the domain $R_{p}$ . Specifically, let $Γ_{lower} = \{(x_{i}, y_{i}) \in R_{p} : b_{p - 1} < d (x_{i}, y_{i}) < b_{p - 1} + ϵ\}$ and let $Γ_{upper} = \{(x_{i}, y_{i}) \in R_{p} : b_{p} - ϵ^{'} < d (x_{i}, y_{i}) < b_{p}\}$ be the set of spatial locations on the lower and upper boundary curves of the spatial domain $R_{p}$ , respectively. We set $γ_{p}$ to be the median distance between each spot $(x, y) \in Γ_{lower}$ and the closest spot in $Γ_{upper}$ We choose $ϵ, ϵ^{'}$ such that $Γ_{lower}$ and $Γ_{upper}$ visually correspond to the spatial domain boundaries.

For 10x Genomics Visium data, we multiply each average width $γ_{p}$ by 100, since the physical distance between the centers of adjacent spots in the 10x Visium slide is 100μm. For Slide-seqV2 data, we multiply each average width $γ_{p}$ by 64/100, since two beads that are 100 pixels apart in the Slide-SeqV2 microscopy image have a physical distance of roughly 64μm [¹¹⁶].

To simplify the visualization of the 1-D expression functions $h$ , we aggregate the counts $a_{i, g}$ for spots $(x_{i}, y_{i})$ with approximately equal isodepth values $d (x_{i}, y_{i})$ , as in [⁸³]. Specifically, we partition the range of isodepth values into a union $B_{1} \cup \dots \cup B_{M}$ of intervals $B_{j}$ , and we compute the total expression value ${\tilde{a}}_{j, g} = \sum_{i : d (x_{i}, y_{i}) \in B_{j}} a_{i, g}$ for gene $g$ in each interval $B_{j}$ . We call ${\tilde{a}}_{j, g}$ the pooled expression value of gene $g$ at pooled spot $j$ . Pooling does not affect inference of the 1-D expression function $h$ in the STP, as the function $h$ obtained by maximizing the log-likelihood (9) with pooled data is equal to the function obtained by maximizing (9) with the original data, as shown in [⁸³].

We plot expression as log pooled counts per million (CPM) $l o g ({\tilde{a}}_{j, g} / {\tilde{D}}_{j} \cdot 10^{6} + 1)$ , where ${\tilde{D}}_{j}$ is the sum of the total UMI counts across all spots in the jth pooled spot. The log pooled CPM has approximately the same scale as the expression function $h_{g} (w) + l o g (10^{6})$ for each gene $g$ .

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol