Descriptors of atomic environments

AG Andrea Gardin
CP Claudio Perego
GD Giovanni Doni
GP Giovanni M. Pavan
ask Ask a question
Favorite

Let a generic output of an MD simulation be the ensemble of system conformations sampled at a series of MD timesteps, represented by the atomic coordinates R(t) of the molecular species in the system at timestep t. R is a 3N-dimensional configuration vector, where N is the number of particles. A generic descriptor is a mapping from the 3N-dimensional coordinate space to a D-dimensional feature space, that associates a feature vector to each of the sampled conformations R(t). We aim at a descriptor capable to characterize and compare the atomic/molecular environment surrounding the multiple constituents of a soft, supramolecular system. A first, crucial requirement of this mapping is that it preserves physical symmetries such as permutation, translation and rotation invariance, ensuring that physically equivalent configurations are recognized as such by the descriptor65. Other less obvious properties, required for a useful representation of the atomic/molecular environment are e.g., smoothness, additivity and level of locality; SOAP descriptors satisfy all these requirements 66,67.

The SOAP49 is an atom-centered descriptor, that accurately reproduces density correlation features of many-body systems. SOAP were introduced in ref. 49 as new bond-order parameters, able to efficiently account for radial and angular information of the environment that surrounds atoms or molecules41,49,66. The SOAP descriptor of a system of N particles describes the atomic/molecular surroundings of a selected set of M coordinates of the system components, which are referred to as the “centers” of the SOAP vector. These M centers can include the position of every single atom of the system, as well as selections or combinations (as e.g., center of geometry/mass) of them. In the following, we comment on the choice of the SOAP center for each of the studied systems, but in general we associate a single SOAP vector per each monomer (see also “Results and discussion”). In this work we have considered only single-specie systems, but the approach is generalizable to multiple species.

The SOAP descriptor is built from the density of neighboring centers j that surround the i-th center, namely

where a Gaussian function is associated to each neighboring center (located at distance r = rij from the i-th center), to build a smooth density function. The σ parameter sets the width of the Gaussian located at the j-th neighboring center. The total neighbor density ρ = ∑iρi is retrieved by summing all contributions. The function frcut smoothly goes from 1 to 0 at rcut, so that the environment of each center extends up to a fixed cutoff rcut. Starting from Equation (1), which is intrinsically invariant for the permutation of centers, the SOAP descriptors are defined by incorporating the translation and rotation invariances. This is done in two steps: (i) Eq. 1 is expanded in the basis of orthonormal radial functions Rn(r) and spherical harmonics Yl,m(r^); (ii) rotational invariance is enforced by building symmetrized combinations of the expansion coefficients49,66.

We here employ the second-order SOAP descriptor, also called SOAP power spectrum (in analogy with the Fourier analysis), which can be written as

where cnlmi are the expansion coefficients of the particle density surrounding the ith-center. The full SOAP descriptor associated to the i-th center is a vector including all contributions from Eq. 2,

where n and n range from 1 to nmax and l ranges from 1 to lmax, setting the dimension D of the vector (in cases of multiple species D depends also on the number of species)41,49. To compute Eq. 2 we used the python package DScribe68, using nmax, lmax = 8 and three different cutoff values rcut = 0.8, 1.6, 3.0 nm. The remaining parameters were set to default values of the DScribe library.

To summarize, for a given 3N-dimensional configuration vector R sampled via CG-MD, we define a set of M centers {i}, and for each center we compute the SOAP vector {pi}. This generates a dataset of M SOAP vectors that describe the structural arrangement of the centers in the selected configuration of the system. Such SOAP vectors represent “local” descriptors, encoding the information on the environments that surrounds each center.

We can also define “global” descriptors, useful for the comparison of different systems: (i) the frame-average SOAP descriptor p¯t={γ¯nnlt}, where each component of the vector is computed as the power spectrum of the density (Eq. 1), after averaging over all the M centers, namely 68:

This gives a global (averaged) picture of the structural features characterizing a specific MD frame. (ii) The simulation-average SOAP descriptor,

that averages all the frame-average SOAP descriptors along the T frames collected through the MD simulation. Equation 5 represents a compact global fingerprint for the equilibrium structure of the system under investigation, provided that the T frames are sampled at the equilibrium; we used the frame-average SOAP to assert similarities between different structures. See further details in Supplementary Figs. 1, 2 and Supplementary Note 1.

Once SOAP feature vectors are computed, the similarity between the SOAP characterization of different supramolecular structures can be inferred via a distance estimation (or kernel function), provided that the dimensionality D of the compared SOAP vectors is the same (i.e., using equal nmax and lmax) between the compared systems. We measure the similarity between two SOAP vectors using a linear polynomial kernel 49,50,

where q = p/∣p∣ is the unit-normalized SOAP vector. Upon normalization, K(i,j) is equal to 1 if the two molecular environments are exactly superimposed or 0 if no overlap occurs. Since Eq. 6 defines a positive-definite kernel, it naturally induces a metric, for which the distance between two feature vectors is defined as50,67

Recently, this SOAP-induced metric was employed in ref. 56 to compare and classify the structural arrangement of lipid membranes represented via different force-fields. We here apply the same approach for the comparison of different supramolecular systems, computing Eq. 7 for the pairs of simulation-average SOAP vectors obtained via the simulation of each different test case.

The SOAP metric of Eq. 7 fully depends on the high-dimensional information contained in the SOAP spectra, and it provides a classification that is free from prior assumptions on the structural order of the considered systems. The setting of the cutoff radius rcut in the calculation of the SOAP can influence the resulting feature vectors, and few considerations are useful in this sense. The comparative analysis of all the systems, reported in Fig. 6, shows that the choice of a smaller cutoff (rcut = 0.8 nm) can lower the accuracy of the comparison, especially when substantially different systems are compared (see the dSOAP matrices in Fig. Fig.6B:top,6B:top, Supplementary Figs. 6, 7 and 9, and Supplementary Note 2 for additional discussion on this matter). Nonetheless, even a rcut = 0.8 nm (which takes into account only the closest neighbors of the monomers in the SOAP analysis, see Supplementary Fig. 4) appears sufficient to obtain a good characterization of the molecular environment in each system, capturing an analogous pattern of relative distances dSOAP with respect to the results obtained with larger rcut (see the rescaled dSOAP matrices in Fig. 6B:bottom). We underline that we chose a fixed value of rcut for all the systems, in each of the analysis reported previously. This is made possible by the fact that the studied systems are all described with a similar resolution, namely employing Martini-based CG models. This gives rise to comparable inter-beads distances and facilitates a comprehensive analysis that accounts for the same neighborhood region per each monomer. In principle it should be possible to include in the dataset also simulation results obtained with molecular models at different resolution, provided that the cutoff is differentiated to have a consistent SOAP description across the different models.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A