Diffusion Maps
This protocol is extracted from research article:
Tetranucleosome Interactions Drive Chromatin Folding
ACS Cent Sci, May 7, 2021; DOI: 10.1021/acscentsci.1c00085

Diffusion maps are a nonlinear manifold learning technique that have found extensive applications in generating low-dimensional embeddings of high-dimensional molecular trajectories.4143 Assuming that the distance metric used to compare pairs of configurational microstates is a good proxy for short-time kinetic distance and that the conformational dynamics over the state space may be approximated as a diffusion process, the leading collective variables of the diffusion map correspond to the large-scale, high-variance collective motions of the system, and kinetically close configurational microstates are embedded close together.24 We employ the density-adaptive variant of diffusion maps, which we find to be particularly useful for handling the large inhomogeneities in sampling densities observed in our chromatin simulations.44 We provide a brief summary of the approach below, but direct the reader to prior publications for mathematical and algorithmic details.24,4143

Pairwise distances, dij, are calculated between data points in our set, xi and xj, which correspond to the RMSD between translationally and rotationally aligned nucleosomal coordinates in frames i and j of the simulation. A Gaussian kernel is applied to dij to construct a threshold pairwise distance matrix A,

where ϵ is the kernel bandwidth and defines the local neighborhood of each point and α is a parameter that globally rescales pairwise distances to smooth out large density fluctuations between densely and sparsely sampled regions of configurational state space.44 Matrix A is then row-normalized to form the transition matrix,

where D is a diagonal matrix with elements,

The transition matrix, M, is then diagonalized to calculate its eigenvectors ψi and eigenvalues λi. By the Markov property, the top eigenvalue–eigenvector pair (ψ0 = 1, λ0 = 1) is trivial, corresponding to the steady-state distribution of a random walk. A gap in the eigenvalue spectrum after the kth nontrivial eigenvalue identifies the k-leading eigenvectors corresponding to the leading high-variance nonlinear collective modes of the system. Snapshot i of the molecular simulation trajectory is embedded into these collective variables spanning the so-called intrinsic manifold of the system under the mapping,

The ψk are the leading nonlinear collective variables identified by the diffusion map that correspond to the high-variance dynamical modes of the system and are responsible for large-scale conformational rearrangements.

Free energy surfaces over the intrinsic manifold G(Ψ) are computed by collecting histogram approximations P̂ to the observed distribution of configurational microstates projected into the leading k-eigenvectors Ψ = {ψi}i=1k and then inverting this distribution using the relation

where β = 1/(kBT) is the inverse temperature and C is an arbitrary additive constant that sets an absolute free energy scale.45 By virtue of the interpretability of the eigenvectors as the leading collective modes of the system, the free energy surface constructed over the intrinsic manifold can resolve both the metastable macrostates of the chromatin structure and the interconversion pathways between them.24 Diffusion maps have already been used successfully to examine the dynamics of DNA around histone proteins, thereby providing precedent for our approach,43 but we note that we could have employed tICA, VAMPnets, or SRVs in conjunction with Markov-state models to identify kinetic microstates and macrostates.4650 These approaches have the benefit of furnishing kinetic networks without requiring that the assumption of diffusive dynamics be made. In the present work, it is the structure and thermodynamics of the metastable states that are of primary interest, as opposed to the kinetic transition rates, and for this reason we favor the smooth, continuous, and more structurally interpretable free energy surfaces furnished by diffusion maps.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.