Our protocol, called “flex ddG”, is implemented within the RosettaScripts interface to the Rosetta macro-molecular modeling software suite, 45 which makes the protocol easily adaptable to future improvements and energy function development. The method can be run using a Rosetta Scripts XML that is available in the Supporting Information as Listing 1. Version numbers of tested software are available in Table S1.
Flex ddG method steps are outlined in Fig. 1. Step 1: The protocol begins with an initial minimization (on backbone ϕ/ψ and side chain χ torsional degrees of freedom, using the limited-memory Broyden-Fletcher-Goldfarb-Shanno minimizer implementation within Rosetta, with Armijo inexact line search conditions (option “lbfgs_armijo_nonmonotone”) of the input crystal structure of the wild-type protein complex. This (and later) minimizations are performed with harmonic restraints on pairwise atom distances to their values in the input crystal structure. Restraints were added for all pairs for C-α atoms within 9 Å of each other using a harmonic score potential defined to have the width (standard deviation) parameter set to 0.5 Å, and added to the Rosetta score function with a term weight of 1.0. Minimization is run until convergence (absolute score change upon minimization of less than one REU (Rosetta Energy Unit)). Step 2: Starting from the minimized input structure including both binding partners in the protein-protein complex, the backrub method in Rosetta31 is used to create an ensemble of models. In brief, each backrub move is undertaken on a randomly chosen protein segment consisting of three to twelve adjacent residues in the neighborhood of any mutated position. The mutation neighborhood is defined by finding all residues in the protein-protein complex with a C-β atom (C-α for glycines) within 8 Å of any mutant position, then adding this residue and its adjacent N and C-terminal residues to the list of neighborhood residues. All atoms in the backrub segment are rotated locally about an axis defined as the vector between the endpoint C-α atoms. The allowed rotation angles for the backrub steps use Rosetta default values as described in Smith & Kortemme, 2008.31 Backrub is run at a temperature of 1.2 kT, for up to 50,000 backrub Monte Carlo trials/steps (Table S2 shows that using a kT of 1.6 gives similar results to a kT of 1.2). Up to 50 output models are generated. Step 3A: For each of the 50 models in the ensemble output by backrub, the Rosetta “packer” is used to optimize side chain conformations for the wild-type sequence using discrete rotameric conformations 46 and simulated annealing. The packer is run with the multi-cool annealer option, 47 which is set to keep a history of the 6 best rotameric states visited during annealing. Step 3B: Independently and in parallel to step 3A, side chain conformations for the mutant sequence are optimized on all 50 models, introducing the mutation(s). Step 4A: Each of the 50 wild-type models is minimized, again adding pairwise interatomic distance restraints to the input structure. Minimization is run with the same parameters as in step 1; the coordinate restraints used in this step are taken from the coordinates of the Step 3A model. Step 4B: As Step 4A, but for each of the 50 mutant models. Step 5A: Each of the 50 minimized wild-type models are scored in complex, and the complex partners are scored individually. The scores of the split, unbound complex partners are obtained simply by moving the complex halves away from each other. No further minimization or side chain optimization is performed on the unbound partners before scoring. Step 5B: In the same fashion as Step 5A, each of the 50 minimized mutant models are scored in complex, and the complex partners are scored individually. Step 6: The interface ΔΔG score is calculated via Eq. 1 as the arithmetic mean over the different models produced:
Schematic of the flex ddG protocol method.
We evaluate performance of the protocol by comparing predicted ΔΔG scores to known experimental values, using Pearson’s correlation (R), Fraction Correct (FC), and Mean Absolute Error (MAE). Fraction Correct is defined as the number of cases in the dataset categorized correctly as stabilizing, neutral, or destabilizing, divided by the total number of cases in the dataset. Stabilizing mutations are defined as those with a ΔΔG <= −1.0 kcal/mol, neutral as those with −1.0 kcal/mol < ΔΔG < 1.0 kcal/mol, and destabilizing as those with ΔΔG >= 1.0 kcal/mol.
MAE (Mean Absolute Error) is defined in Eq. 2 as:
where yi are the predicted ΔΔG values, xi are the known, experimentally determined values, and ei is the prediction error.
As a control, we ran the flex ddG protocol omitting the backrub ensemble generation step. This control protocol can in principle generate multiple models because of the minimization and packing steps, but in practice these models are structurally highly similar or identical.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.