PDB validation metrics

Muyuan Chen

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

PDB validation metrics

MC Muyuan Chen

This method is extracted from research article: bioRxiv, Sep 2024

Building molecular model series from heterogeneous CryoEM structures using Gaussian mixture models and deep neural networks

DOI: 10.1101/2024.09.27.615511

Ask a question

Favorite

In our refinement method, only metrics used in the PDB validation server are implemented to constrain the geometry of the molecules. Since many of the PDB validation metrics are based on statistics from decades-old literature with small sample sizes^24,38,44, it is debatable whether achieving a high validation score is necessary or sufficient for a good protein/RNA structure. For most of the validation metrics, the allowed std is much smaller than the resolution of even the best structures determined using CryoEM. For example, the std of most bond lengths is around 0.02Å, so even an outlier would be only ~0.1Å off from a structure with the ideal geometry. Very often, the difference between models with a good and poor score can be so tiny that they have virtually the same similarity score when compared to a CryoEM map at near-atomic resolution. While our method can produce molecular models with near-perfect validation scores, the actual quality of the models still depends on how accurate the stereochemical statistics are, and the model we generate can be biased if the validation metrics we use are not reliable.

To show the sensitivity of the validation metrics, in a more extreme example, we started from a model of apoferritin (PDB-8T4Q) with a perfect PDB validation score and refined it with an inverted geometry loss function. That is, the optimizer will search locally, starting from a good model, and look for a model with the worst possible geometry score. Surprisingly, a model with an extremely poor score at every metric can be produced, which has only 0.2Å RMSD on average from the input good model, and no atoms shift more than 0.3Å (Figure S7). With such small movement, the two models would be virtually indistinguishable even when the map is determined at near-atomic resolution. As a result, for most CryoEM structures, the geometry scores of the molecular models are actually driven by the set of validation metrics, as well as the model refinement method used, instead of the actual information from the experimental maps.

Additionally, since PDB validation currently does not include scores for the geometry of DNA (other than bond length and angle) or many small molecule ligands, the refinement of those features is only driven by the CryoEM maps. While improving the validation metrics will be an ongoing effort of the broad structure biology community, the DNN implementation of our method makes it relatively convenient to incorporate new, well-defined metrics as a part of the loss function during training, so the quality of output models can improve as the field moves forward.

This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol