Molecular docking techniques aim to predict the best matching binding mode of a ligand to a macromolecular partner (here just proteins are considered). It consists in the generation of a number of possible conformations/orientations, i.e., poses, of the ligand within the protein binding site. For this reason, the availability of the three-dimensional structure of the molecular target is a necessary condition; it can be an experimentally solved structure (such as by X-ray crystallography or NMR) or a structure obtained by computational techniques (such as homology modeling) (Salmaso, 2018).
Molecular docking is composed mainly by two stages: an engine for conformations/orientations sampling and a scoring function, which associates a score to each predicted pose (Abagyan and Totrov, 2001; Kitchen et al., 2004; Huang and Zou, 2010). The sampling process should effectively search the conformational space described by the free energy landscape, where energy, in docking, is approximated by the scoring function. The scoring function should be able to associate the native bound-conformation to the global minimum of the energy hypersurface.
Scoring functions play the role of poses selector, used to discriminate putative correct binding modes and binders from non-binders in the pool of poses generated by the sampling engine.
There are essentially three types of scoring functions:
1. Force-field based scoring functions:
Force-field is a concept typical of molecular mechanics (see Box 1) which approximates the potential energy of a system with a combination of bonded (intramolecular) and nonbonded (intermolecular) components. In molecular docking, the nonbonded components are generally taken into account, with possibly the addition of the ligand-bonded terms, especially the torsional components. Intermolecular components include the van der Waals term, described by the Lennard-Jones potential, and the electrostatic potential, described by the Coulomb function, where a distance-dependent dielectric may be introduced to mimic the solvent effect. However, additional terms have been added to the force-field scoring functions, such as solvation terms (Brooijmans and Kuntz, 2003).
Molecular mechanics is a method which approximates the treatment of molecules with the laws of classical mechanics, in order to limit the computational cost required for quantum mechanical calculations (Vanommeslaeghe et al., 2014). Atoms are considered as charged spheres connected by springs, neglecting the presence of electrons, in accordance with Born-Oppenheimer approximation (Born and Oppenheimer, 1927). The potential energy is approximated by a simple function which is called force-field; it is the sum of bonded (intramolecular) and nonbonded energy terms. The basic form of the function comprise bond stretching and bending described by harmonic potential, and torsional potential described by a trigonometric function, in the bonded portion. Nonbonded terms consist of van der Waals and Coulomb electrostatic interactions between couples of atoms.
As an example, these basic components of the CHARMM [78] force field are reported in the following equations
where Kb, Kθ, and Kχ are the bond, angle and torsional force constants; b, θ and χ are bond length, bond angle and dihedral angle (those with the 0-subscript are the equilibrium values); n is multiplicity and δ the phase of the torsional periodic function; rijis the distance between atoms i and j; qi and qj are the partial charges of atoms i and j; ε is the effective dielectric constant; εijis the Lennard-Jones well depth and Rmin, ij is the distance between atoms at Lennard-Jones minimum.
These terms may appear slightly different in different force-fields, and anharmonicity and cross-terms are generally added.
The parameters of the force field are obtained by fitting quantum mechanical or experimental values.
Examples of force field based scoring functions are GoldScore (Verdonk et al., 2003), AutoDock (Morris et al., 1998) (improved as a semiempirical version in AutoDock4, Huey et al., 2007), GBVI/WSA (Corbeil et al., 2012).
2. Empirical scoring functions:
These functions are the sum of various empirical energy terms such as van der Waals, electrostatic, hydrogen bond, desolvation, entropy, hydrophobicity, etc., which are weighted by coefficients optimized to reproduce binding affinity data of a training set by least squares fitting (Huang and Zou, 2010).
The LUDI (Böhm, 1994) scoring function was the first example of an empirical one. Other empirical scoring functions are GlideScore (Halgren et al., 2004; Friesner et al., 2006), ChemScore (Eldridge et al., 1997), PLANTSCHEMPLP (Korb et al., 2009).
3. Knowledge-based scoring functions:
These methods assume that ligand-protein contacts statistically more explored are correlated with favorable interactions. Starting from a database of structures, the frequencies of ligand-protein atom pairs contacts are computed and converted into an energy component. When evaluating a pose, the aforementioned tabulated energy components are summed up for all ligand-protein atom pairs, giving the score of the pose.
DrugScore (Gohlke et al., 2000; Velec et al., 2005) and GOLD/ASP (Mooij and Verdonk, 2005) are examples of knowledge-based scoring functions.
Another strategy consists in the combination of multiple scoring functions leading to the so-called consensus scoring (Charifson et al., 1999).
In addition, new scoring functions have been developed: for example, based on machine learning technologies, interaction fingerprints and attempts with quantum mechanical scores (Yuriev et al., 2015).
The first molecular docking algorithm was developed in the 1980s by Kuntz et al. (1982); the receptor was approximated by a series of spheres filling its surface clefts, and the ligand by another set of spheres defining its volume. A search was made to find the best steric overlap between binding site and receptor spheres, neglecting any kind of conformational movement.
This method belongs to the group of fully-rigid docking techniques, according to the classification which divides docking methods according to the degrees of flexibility of the molecules involved in the calculation Halperin et al., 2002 (Figure (Figure11):
Molecular docking techniques organized according to ligand-protein flexibility and conformational searching engines.
1. Rigid docking:
Both ligand and protein are considered rigid entities, and just the three translational and three rotational degrees of freedom are considered during sampling. This approximation is analogous to the “lock-key” binding model and is mainly used for protein-protein docking, where the number of conformational degrees of freedom is too high to be sampled. Generally, in these methods, the binding site and the ligand are approximated by “hot” points and the superposition of matching point is evaluated (Taylor et al., 2002).
2. Semi-flexible docking:
Just one of the molecules, the ligand, is flexible, while the protein is rigid. Thus, the conformational degrees of freedom of the ligand are sampled, in addition to the six translational plus rotational ones. These methods assume that a fixed conformation of a protein may correspond to the one able to recognize the ligands to be docked. This assumption, as already reported, is not always verified.
3. Flexible docking:
It is based on the concept that a protein is not a passive rigid entity during binding and considers both ligand and protein as flexible counterparts. Different methods have been introduced during the years, some rested on the induced fit binding model and others on conformational selection.
The great number of degrees of freedom introduced by flexible docking makes the potential energy surface to be a function of numerous coordinates. Consequently, the computational effort required to perform a docking calculation is augmented, but both sampling and scoring should be optimized to give a good balance between accuracy and speed. In fact, virtual screening campaign of millions of compounds depends on the velocity of docking calculations. For this reason, more and more improvements have been made in the development of the new algorithm, able to deeply search the phase space but not at the expense of velocity.
Numerous docking algorithms have been developed since the 1980s. Often it is difficult to classify clearly each docking software, because different algorithms may be integrated into a multi-phase approach. However, docking algorithms can be classified as follows (Kitchen et al., 2004; Huang and Zou, 2010):
1. Systematic search techniques:
In a systematic search, a set of discretized values is associated with each degree of freedom, and all the values of each coordinate are explored in a combinatorial way (Brooijmans and Kuntz, 2003). These methods are subdivided into:
Exhaustive search - it is a systematic search in the strict sense since all the rotatable bonds of the ligands are examined in a systematic way. A number of constraints and termination criteria is generally established to limit the search space and to avoid a combinatorial explosion. The docking pipeline of the software Glide (Friesner et al., 2004; Halgren et al., 2004) involves a stage of the exhaustive search.
Fragmentation- the first implementation of ligand flexibility into docking was introduced by DesJarlais et al. (1986), who proposed a method made of fragmentation of the ligand, rigid docking of the fragments into the binding site, and subsequent linking of the fragments. In this way, partial flexibility is implemented at the joints between the fragments. Other methods, defined as incremental construction, dock one fragment first and then attach incrementally the others. Examples of methods utilizing fragmentation are FlexX (Rarey et al., 1996) and Hammerhead (Welch et al., 1996).
Conformational Ensemble- rigid docking algorithms can be easily enriched by a sort of flexibility if an ensemble of previously generated conformers of the ligand is docked to the target, in a sort of conformational selection fashion on the ligand counterpart. Examples are offered by FLOG (Miller et al., 1994), EUDOC (Pang et al., 2001), MS-DOCK (Sauton et al., 2008).
2. Stochastic methods:
Stochastic algorithms change randomly, instead of systematically, the values of the degrees of freedom of the system. The advantage of these techniques is the speed, so they could potentially find the optimal solution really fast. As a drawback, they do not ensure a full search of the conformational space, so the true solution may be missed. The lack of convergence is partially solved by increasing the number of iterations of the algorithm. The most famous stochastic algorithms are (Huang and Zou, 2010):
Monte Carlo (MC) methods- Monte Carlo methods are based on the Metropolis Monte Carlo algorithm, which introduces an acceptance criterion in the evolution of the docking search. In particular, at every iteration of the algorithm, a random modification of the ligand degrees of freedom is performed. Then, if the energy score of the pose is improved, the change is accepted, otherwise, it is accepted according to the probability expressed in the following equation:
where E1 and E0 are the energy score before and after the modification, kB the Boltzmann constant, and T the temperature of the system.
This is the original form of the Metropolis algorithm, but it is implemented in different variants within docking software. Some example are provided by the earlier versions of AutoDock (Goodsell and Olson, 1990; Morris et al., 1996), ICM (Abagyan et al., 1994), QXP (McMartin and Bohacek, 1997), MCDOCK (Liu and Wang, 1999), AutoDock Vina (Trott and Olson, 2010), ROSETTALIGAND (Meiler and Baker, 2006).
b. Tabu search methods- the aim of these algorithms is to prevent the exploration of already sampled zones of the conformational/positional space. Random modifications are performed on the degrees of freedom of the ligand at each iteration. The already sampled conformations are registered, and when a new pose is obtained, it is accepted only if not similar to any previously explored pose. PRO_LEADS (Baxter et al., 1998) and PSI-DOCK (Pei et al., 2006) are two examples of this category.
c. Evolutionary Algorithms (EA) - these algorithms are based on the idea of biological evolution, with the most famous Genetic Algorithms (GAs). The concept of the gene, chromosome, mutation, and crossover are borrowed from biology. In particular, the degrees of freedom are encoded into genes, and each conformation of the ligand is described by a chromosome (collection of genes), which is assigned a fitness score. Mutations and crossovers occur within a population of chromosomes, and chromosomes with higher fitness survive and replace the worst ones. The most famous examples are GOLD (Jones et al., 1995, 1997), AutoDock 3 & 4 (which implement a different version of GA, the Lamarckian GA) (Morris et al., 1998), PSI-DOCK (Pei et al., 2006), rDock (Ruiz-Carmona et al., 2014).
d. Swarm optimization (SO) methods- these methods take inspiration from swarm behavior. The sampling of the degrees of freedom of a ligand is guided by the information deposited by already sampling good poses. For example, PLANTS (Korb et al., 2006) adopts an ACO (Ant Colony Optimization) algorithm, which mimics the behavior of ants, who communicate the easiest way to reach a source of food through the deposition of pheromone. Here, each degree of freedom is associated with a pheromone. Virtual ants choose conformations considering the values of pheromones, and successful ants contribute to pheromone deposition.
Other examples of SOs are SODOCK (Chen et al., 2007), pso@autodock (Namasivayam and Günther, 2007), PSOVina (Ng et al., 2015).
3. Simulation methods:
The most famous example of this category is Molecular Dynamics, a method that describes the time evolution of a system. A wider explanation will be given in section Molecular Dynamics.
Energy minimization methods can be inserted in this category, but generally, they are not used as stand-alone search engines (Kitchen et al., 2004). Energy minimization is a local optimization technique, used to bring the system to the closest minimum on the potential energy surface.
Some attempts have been made to introduce protein flexibility into docking calculations. These methods take advantage of different degrees of approximation and can be divided into approaches that consider single protein or multiple protein conformations (Alonso et al., 2006).
1. Single Protein Conformation:
a. Soft docking:
This method, firstly described by Jiang and Kim (1991), consists of an implicit and rough treatment of protein flexibility. The van der Waals repulsion term employed in force field scoring functions is reduced, allowing small clashes that permit a closer ligand-protein packing. In this way, a sort of induced-fit is simulated. As a drawback, this approach approximates just feeble protein movements and could implicate unreal poses (Apostolakis et al., 1998; Vieth et al., 1999).
b. Sidechain flexibility:
This strategy introduces alternative conformations for some protein side chains (Leach, 1994). This is generally done exploiting databases of rotamer libraries. Some docking methods, such as GOLD, sample some degrees of freedom within their own search engine. Obviously, considering side chain flexibility, huge conformational variations of the protein are neglected by these methods.
2. Multiple Protein Conformations:
Multiple experimental structures may be available for the same target. Moreover, an ensemble of protein conformations can be obtained via computational techniques, such as Monte Carlo or Molecular Dynamics simulations. The idea of multiple protein conformations docking is to take into account all the diverse structures, following different possible strategies:
a. Average grid:
The structures of the ensemble are used to construct a single average-grid, which can be either a simple or weighted average combination of them (Knegtel et al., 1997).
b. United description of the protein:
In this case, the structures do not collapse into an average grid but are used to construct the best performing “chimera” protein. For example, FlexE (Rarey et al., 1996) extracts the structurally conserved portions from the structures of the ensemble and uses them to construct an average rigid structure. This portion is fused to the flexible parts of the ensemble in a combinatorial fashion, giving a pool of “chimeras” that are used for docking.
c. Individual conformations:
The structures of the ensemble are considered as conformations that can possibly be bound by the ligand, so various docking runs are performed, evaluating the ligands of interest on all the target conformations (Huang and Zou, 2007). Moreover, a preliminary benchmark assessing the performance of different target structures in a cross-docking experiment may be employed to filter the ensemble of structures (Salmaso et al., 2016, 2018).
Among the drugs approved by the Food and Drug Administration, few examples of successful applications of CADD are available (Talele et al., 2010). Among them, the renin-inhibitor Aliskiren was developed by means of a combination of molecular modeling and crystallographic structure analysis (Wood et al., 2003). However, the binding of non-peptidomimetic ligands to renin has shown huge structural rearrangement of the protein (Teague, 2003), addressing the problem of considering protein flexibility in drug design campaigns. Recently, a comparative study evaluating the performance of ensemble docking and individual crystal structure docking has been proposed for renin (Strecker and Meyer, 2018). An ensemble of 4 crystal structures outperformed the mean results of individual crystal structures in terms of binding mode prediction and screening utility. The ensemble gave worse results than the best performing crystal structure, which though is not known a priori. Not as good results were obtained through a Molecular Dynamics ensemble when compared to crystallographic structures, as confirmed in other cases reported in the literature (Osguthorpe et al., 2012; Ganser et al., 2018). However, Molecular Dynamics has proven to be effective as a tool to explore molecular conformations and as a docking method itself, as reported in the following paragraphs.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.