The cryoDRGN method

Ellen D. Zhong; Tristan Bepler; Bonnie Berger; Joseph H. Davis

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

The cryoDRGN method

EZ Ellen D. Zhong

TB Tristan Bepler

BB Bonnie Berger

JD Joseph H. Davis

This method is extracted from research article: Feb 2021

CryoDRGN: Reconstruction of heterogeneous cryo-EM structures using neural networks

DOI: 10.1038/s41592-020-01049-4

Request a Protocol

Ask a question

Favorite

The cryoDRGN method performs heterogeneous cryo-EM reconstruction by learning a neural network representation of 3D structure. In particular, we use a positionally-encoded multilayer perceptron (MLP) to approximate the function V: ℝ³⁺ⁿ → ℝ, which models structures as generated from an n-dimensional continuous latent space. We refer to this architecture as a coordinate-based neural network^41,42 as we explicitly model the volume as a function of Cartesian coordinates.

Without loss of generality, we model volumes on the domain [−0.5,0.5]³. Instead of directly supplying the 3D Cartesian coordinates, k, to the deep coordinate network, coordinates are featurized with a fixed positional encoding function⁴³ consisting of sinusoids whose wavelengths follow a geometric progression from 1 up to the Nyquist limit:

where D is set to the image size^¹ used in training. Empirically, we found that excluding the highest frequencies of the positional encoding led to better performance when training on noisy data, and we provide an option to modify the positional encoding function by increasing all wavelengths by a factor of 2π.

This neural representation of 3D structure is learned via an image-encoder/volume-decoder architecture based on the variational autoencoder (VAE)^30,44. We follow the standard image formation model in single particle cryo-EM where observed images are generated from projections of a volume at a random unknown orientation, R ∈ SO(3). We use an additive Gaussian white noise model. Volume heterogeneity is generated from a continuous latent space, modeled by the latent variable z, where the dimensionality of z is a hyperparameter of the model.

Given an image X, the variational encoder, q_ξ(z|X), produces a mean and variance, μ_z|X and Σ_{_z|X}, statistics that parameterize a Gaussian distribution with diagonal covariance, as the variational approximation to the true posterior p(z|X). The prior on the latent variable is a standard normal distribution $N$ (0, I). The positionally-encoded MLP is used as the probabilistic decoder, p_θ(V| k, z), and models structures in frequency space. Given Cartesian coordinate k ∈ ℝ³ and latent variable z, the probabilistic decoder predicts a Gaussian distribution over V(k, z). The encoder and decoder are parameterized with fully connected neural networks with parameters ξ and θ, respectively.

Since 2D projection images can be related to volumes as 2D central slices in Fourier space²⁹, oriented 3D coordinates for a given image can be obtained by rotating a D × D lattice spanning [−0.5,0.5]² originally on the x-y plane by R, the orientation of the volume during imaging. Then, given a sample out of q_ξ(z|X) and the oriented coordinates, an image can be reconstructed pixel-by-pixel through the decoder. The reconstructed image is then translated by the image’s in-plane shift and multiplied by the CTF before it is compared to the input image. The negative log likelihood of a given image under our model is computed as the mean square error between the reconstructed image and the input image. Following the standard VAE framework, the optimization objective is a variational lower bound of the model evidence:

where the first term is the reconstruction error estimated with one Monte Carlo sample, the second term is a regularization term on the latent representation, and β is an additional hyperparameter, which we set by default to 1/|z|. By training on many 2D slices with sufficiently diverse orientations, the 3D volume can be learned through feedback from the 2D views. For further details, we refer the reader to a preliminary version of the method described in the proceedings of the International Conference for Learning Representations⁴¹. The results presented here employ the training regime described in Zhong et al. using previously determined poses from a consensus reconstruction⁴¹.

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol