Dynamics graph and underlying gene regulatory networks inference

YZ Yumin Zheng
JS Jonas C. Schupp
TA Taylor Adams
GC Geremy Clair
AJ Aurelien Justet
FA Farida Ahangari
XY Xiting Yan
PH Paul Hansen
MC Marianne Carlon
EC Emanuela Cortesi
MV Marie Vermant
RV Robin Vos
LS Laurens J. De Sadeleer
IR Ivan O Rosas
RP Ricardo Pineda
JS John Sembrat
MK Melanie Königshoff
JM John E. McDonough
BV Bart M. Vanaudenaerde
WW Wim A. Wuyts
NK Naftali Kaminski
JD Jun Ding
ask Ask a question
Favorite

UNAGI builds a dynamic graph to illustrate the progression of each cell population (cell type or subtypes) throughout disease progression. We apply Leiden clustering117 on the latent embeddings, generated by Graph VAE-GAN, to identify distinct cell populations at each disease stage. To measure distances between cell populations in adjacent stages, we use the KL divergence rather than Euclidean distance, which can be problematic in high-dimensional data contexts118,119. For each cell population (e.g., cell type), we approximate its distribution using a Monte Carlo Sampling strategy120 involving the sampling of each dimension of the latent embeddings a thousand times to form a multivariate normal distribution. The KL divergence is calculated to measure the distance between these populations’ multivariate normal distributions.

Additionally, we identify the top 100 differentially expressed genes (DEGs) in each cell population. We then calculate DEG distances among cell populations across stages. The DEG distance is defined as 𝒯d(DEGc1,DEGc2)*jDEGc1|Rjc1Rjc2|, where the first term is the Jaccard Distance between DEGc1 and DEGc2, DEGs of two cell populations. The second term considers the ranking difference between two DEG lists. Here, Rjc1 and Rjc2 represent the ranking of gene j in DEGc1 and DEGc2, respectively. To render the KL divergence and the distances of differentially expressed genes (DEGs) comparable, we implemented min-max normalization for each metric across all potential connections within a specific cluster. After normalization, we represented the distances between each cluster pair as the sum of the normalized KL divergence and the normalized DEGs distances. We then compiled these normalized distances for all possible connections across various disease stages to create a background distance distribution. This distribution is essential for assessing the statistical significance of connections between clusters throughout the different stages of the disease. In scenarios where a cluster is connected to more than one cluster in an adjacent stage, the most statistically significant one will be used. These significant connections form tracks that trace from the control stage to the final stage of the disease, defining the disease progression. Consequently, the dynamic graph Gdynamicproduced includes these progression tracks, each representing comprehensive cellular state transition associated with a specific cell population during disease progression.

Moreover, we employ iDREM (Interactive Dynamic Regulatory Events Miner)121, a machine learning model based on an Input-Output Hidden Markov Model, to reconstruct the temporal gene regulatory network underlying each track (i.e., associated with each cell population) in the reconstructed cellular dynamics graph Gdynamic. This gene regulatory network consists of co-expressed genes and gene regulators that regulate the temporal progression of the disease within each cell population. For each track in Gdynamic, iDREM identifies the genes that undergo similar expression change patterns throughout the disease progression, which was termed as gene paths, some with increasing expression patterns while others with decreasing patterns. For each of the identified co-expressed gene paths, iDREM also provides its enriched GO terms and pathways. Beyond the identification of co-expressed gene paths, iDREM also captures the gene regulators that modulate those gene paths during disease progression. The dynamic genes and gene regulators identified through this process are considered dynamic marker candidates and hold potential as therapeutic targets for the disease.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A