Understanding Docking Complexes of Macromolecules Using HADDOCK: The Synergy between Experimental Data and Computations.

This protocol illustrates the modelling of a protein-peptide complex using the synergic combination of in silico analysis and experimental results. To this end, we use the integrative modelling software HADDOCK, which possesses the powerful ability to incorporate experimental data, such as NMR Chemical Shift Perturbations and biochemical protein-peptide interaction data, as restraints to guide the docking process. Based on the modelling results, a rational mutagenesis approach is used to validate the generated models. The experimental results allow to select a final structural model best representing the bona fide protein-peptide complex. The described protocol can also be applied to model protein-protein complexes. There is no size limit for the macromolecular complexes that can be characterized by HADDOCK as long as the 3D structures of the individual components are available.

Among the docking software used for modelling macromolecular complexes, HADDOCK, an integrative modelling platform, possesses the powerful advantage of being able to implement experimental data directly into the docking process to guide the computations (Dominguez et al., 2003).
Indeed, HADDOCK is specifically designed as a data-driven docking program, although it has two abinitio docking modes for cases when no information is available.
Docking calculations in HADDOCK are driven by ambiguous interaction restraints (AIRs). Those are derived from experimental and/or bioinformatics data and are integrated into the docking calculations in order to restrain the conformational search space (Dominguez et al., 2003) and satisfy the a priori information about the protein-protein interaction. AIRs are defined through a list of residues that fall under two categories: Active and passive. Active residues are defined as the solvent exposed residues directly involved in the interaction between the two proteins (typically defined based on the available experimental data). Passive residues correspond to the solvent exposed ones, which are close to the active residues. Passive residues are introduced in order to account for the fact that often the experimental data are scarce and not fully covering the true binding interface. The main difference between those two types of residues is that active residues are "forced" to be at the interface, while "passive" residues can be at the interface, but are not penalized if not.
A flowchart highlighting the main steps of HADDOCK docking program is shown in Figure 1. The process consists of three successive stages, each of them possessing a specific aim:

1) Initial docking by rigid body energy minimization (it0)
In this initial docking stage, the interacting molecules are treated as rigid bodies. They are separated in space and randomly rotated for each docking trial. The docking step is a rigid body energy minimization during which AIRs, based on the experimental data, are included in the energy function in order to restrict the sampling of the conformational space and guide the docking.

2) Semi-flexible simulated annealing in torsion angle space (it1)
This is a refinement stage during which flexibility is introduced stepwise, first along the side chains and then along both side chains and backbone of the interfacial residues. Those are automatically defined for each model based on their proximity to the partner molecule. The flexible refinement protocol is based on simulated annealing using torsion angles molecular dynamics.

3) Final refinement in explicit solvent (water)
In this final refinement stage the complexes are typically solvated in a layer of water and subjected to a short restrained molecular dynamic in Cartesian space. The final solutions are clustered and the resulting clusters are ranked based on the average score of their top 4 members (4 is the minimum number of models required to define a cluster). 3 www.bio-protocol.org/e3793  This protocol aims at providing a detailed description of the procedure for generating a 3D model structure of a protein-peptide complex using HADDOCK guided by experimental data. We illustrate the protocol by modelling the structure of the complex formed by the Cyclic Nucleotide Binding Domain (CNBD) of the hyperpolarization-activated cyclic nucleotide-gated (HCN) channels bound to the TRIP8bnano peptide. This peptide represents the minimal portion of the brain protein TRIP8b, a regulative subunit of HCN channels, that binds to HCN CNBD . Solution NMR data are used to guide the modelling and the resulting models are then validated following rational mutagenesis/biochemical assays. Access does however require registration, which is free for non-profit users. The NACCESS program is freely available here for researchers at academic and non-profitmaking institutions. NACCESS is used to identify solvent accessible residues. It is worth noting that Naccess program requires Linux/Unix systems. A free alternative to NACCESS is freeSASA, freely available here.

Data requirements
1. 3D coordinates of the two components of the protein complex, preferably in the bound conformation, provided in PDB or mmCIF format.

Definition of Active and Passive residues
Active residues can be identified by using a large variety of experimental data, including solution state NMR and biochemical protein-protein interaction data, as described in this protocol.
Passive residues can be manually assigned by the operator, as performed in this protocol, or automatically defined by HADDOCK. Indeed, the program uses a 6.5 Å distance cutoff from the heavy atoms of the active residues to define the passive ones. Active and Passive residues are incorporated in the input page as comma-separated list of residue numbers (see Procedure).
Note: In the representative case described in this protocol, we define the following active and passive residues for CNBD and TRIP8bnano respectively:  Note: In the representative case described in this protocol, the "Guru interface" is used to run HADDOCK since it allows to specify/modify more parameters to fine tune the docking settings.
Access to this interface does require "guru" level, which can be requested by users in their own registration page.
2. Fill in the HADDOCK input page ( Figure 2): a. Provide a name for the job.
b. Provide the input data, i.e., the 3D coordinates and the lists of the active and passive residues as a comma-separated list of residue numbers for each components of the complex. 6 www.bio-protocol.org/e3793 ix. Solvated docking parameters.
x. Analysis parameters.

Note: In the specific case of CNBD-TRIP8bnano complex this parameter is increased by a
factor of two (Figure 3, "Sampling parameters" tab).
f. The final water-refined models are clustered and ranked based on HADDOCK score value.
Clustering is performed based on pairwise Root Mean Square Deviations (RMSD, the default value is set to a 7.5Å cutoff) and a minimum number of members to define a cluster (default value is set to four) (  of the conformational space during the rigid body docking calculation phase. Furthermore, because CNBD contains a disordered C-terminal tail, which is necessary for the binding (Saponaro et al., 2014), the number of MD steps is also increased to allow the protein to adapt its conformation during the flexible refinement stage.

Figure 4. Advanced sampling parameters used for CNBD-TRIP8bnano complex
3. Provide both username and password to submit a job.

Submission data
Once all data have been uploaded and the run submitted, the web-server offers the option to download a parameter file and provides a link to the results page. This link is also e-mailed to the user.   Van der Waals energy -54 ± 6 -59 + 3

Validation of the generated models by mutagenesis
Rational mutagenesis studies, coupled with an appropriate protein-peptide assay, can be used to validate a cluster-specific contact and eventually perform a second round of modelling introducing as active those residues whose mutations abolish the interaction.

Note: This approach was used for determining the structural model of CNBD-TRIP8bnano
complex.
In order to select residues for mutagenesis we examine the intermolecular contacts formed in each of the two selected clusters. This reveals that: a. Residues E264 or E265 of TRIP8bnano are found to interact with residues K665 or K666 of CNBD in both clusters (see Figure 5). From this second HADDOCK calculation fourteen clusters are obtained. Four of these are consistent with the DEER and mutagenesis data ( Figure 6). In order to validate these models, we performed a second round of mutagenesis based again on identified unique contacts. In particular, in cluster 1 D252 of TRIP8bnano, located at the junction between helix N and helix C, contacts residue N547 of the CNBD, while this contact is absent in the other clusters. It is worth noting that cluster 1 represents also the top-ranking cluster for energetic, scoring function and number of structures populating the cluster ( Table 2). This is explained by the fact that the second docking calculation was further implemented with more ambiguous interaction restraints, which significantly improved HADDOCK results. 12 www.bio-protocol.org/e3793  Van der Waals energy -57 ± 11 -54 ± 6 -49 ± 8 -54 ± 6

Notes
Tutorials and video describing HADDOCK program and its use for various scenarios can be found at http://www.bonvinlab.org/education.