Cai CZ et al / Acta Pharmacol Sin 2004 Jan; 25 (1): 1-8

Review

Advances in modeling of biomolecular interactions1

Cong-zhong CAI2,3, Ze-rong LI4, Wan-lu WANG2, Yu-zong CHEN4,5

2Department of Applied Physics, Chongqing University, Chongqing 400044, China; 3Bioprocessing Technology Centre; 4Department of Computational Science, National University of Singapore, Singapore 117543

1 Project supported by the Fundamental and Applied Fundamental Foundation of Chongqing University, No 713411003.

5 Correspondence to Yu-zong CHEN, PhD. Phn 65-6874-6877. Fax 65-6774-6756. E-mail yzchen@cz3.nus.edu.sg

Received 2002-12-20 Accepted 2003-08-06

KEY WORDS computer simulation; computer-aided design; ligands; protein conformation

ABSTRACT

Modeling of molecular interactions is increasingly used in life science research and biotechnology development. Examples are computer aided drug design, prediction of protein interactions with other molecules, and simulation of networks of biomolecules in a particular process in human body. This article reviews recent progress in the related fields and provides a brief overview on the methods used in molecular modeling of biological systems.

INTRODUCTION

Interactions between biomolecules are fundamental to numerous biological processes[1]. Based on these interactions, living organisms maintain complex regulatory and metabolic interaction networks that together constitutes the processes of life. Understanding of biomolecular interactions is therefore the key in solving the mystery of life. The development of modeling tools for molecular interactions is also essential for rational design of therapeutic drugs and new synthetic proteins that can cure diseases and improve the quality of life for all of us.

Revolution in molecular biology has laid the foundation for the development of molecular modeling into a meaningful weapon in the study of biomolecular interactions. Crystallography and NMR have provided and are continuing to provide the 3D structure of a large number of macromolecules. Such knowledge is essential for realistic modeling of biological systems at molecular level. Advances in the understanding of various biological and disease processes have provided information about the targets of modeling investigation and served to guide and test the modeling algorithms.

The extensive application of molecular modeling in the study of biological systems is also possible by the rapid advances in computer technology. The increase of computer power and the decrease of its cost have provided powerful facilities for the simulation of large and complex systems. As a result, we have witnessed a rapid growth in the development and application of computer algorithms and software tools for modeling molecular interactions in various biological systems and in drug discovery studies. This article intends to give an overview on the recent progresses in modeling ligand-protein, ligand-nucleic acid, protein-protein, and protein-nuclei acid interactions. A brief description to the methodologies for modeling each of these interactions is also given.

MOLECULAR MODELING AND DRUG DESIGN

With specific aim at rational drug design, tremendous effort has been made in the development of algorithms and softwares for modeling ligand-protein interactions[2-9].  Until recently, drugs that exert therapeutic effects on human body were discovered exclusively by slow and chance-dependent random screening processes. The average design cycle for a drug was 6-12 years[10]. Such a rate is too slow to meet today's ever increasing and urgent demand to combat various illness such as cancer, AIDS, and heart diseases. In addition to the cost of 359 million US dollar per marketed drug[11] is too expensive to supply affordable medicines. Thus a rational approach to drug design needs to be developed. Modeling of ligand-protein interactions is widely considered as the best approach to accomplish this task.

Rapid advance in molecular biology has provided insight into molecular aspects of disease processes. Proteins involved in many diseases have been identified. Some of these proteins are candidates as target for new drug development (Moreover some genes and RNA related to disease causing proteins are also used as drug targets). A drug compound is designed to bind to its protein target to either directly block its active site (where biochemical reactions take place) or to change its 3D shape (protein function is shape-dependent). In this way the designed drug can inhibit the function of the targeted protein and thus stop the disease process.

While several strategies can be employed to design drugs, the most rational approach is based on the 3D structure of the receptor of to be designed drug. This can be understood from the mechanism of drug binding. Drug binding is similar to inserting a key to a lock. The target protein acts as a lock with one or a few cavities. A drug can bind to it only if the 3D shape of the drug matches that of one of its cavities, and favorable chemical interactions results in the cavity. Hence given the 3D structure of a protein target, compounds can be designed to fit to a cavity, which is called ligand-protein docking. As the shape of both ligand and protein can adjust upon binding (induced fit), this needs to be taken into consideration in an accurate docking study. The best docked compounds can be used as leads to further design drugs by testing and optimizing their therapeutic effect.

Rapid advances in modeling techniques and computer technology have made it possible to do fast speed automated docking on computers. Newly developed softwares are capable of docking 80 000 compounds to a protein in several days on commonly available computer systems[12], which shows its potential in saving time and cost in drug design process.  Computer interactive graphics and modeling has played an important role in the successful design of such drugs as HIV-1 inhibitors[4,13] and a host of antihypertensive, anticancer, antiarthritis, immunosuppressants, and other drugs[2,4].

INTERACTIVE GRAPHICS APPROACHES

Molecular interactions generally involve surface van der Waals contact, hydrogen bond linkage, ionic interaction, and hydrophobic packing. Molecular surface analysis and computer graphics algorithms have been developed to display these important features[2,4]. Algorithms for molecular surface profile were solved in the early 1980s by Connolly and others[14-16]. A smooth three-dimensional contour about a molecule can be generated by rolling probing spheres on the surface atoms represented by a group of spheres of van der Waals radii (Fig 1). The molecular surface envelope can be drawn on either color raster computer displays or real-time vector computer graphics systems. Molecular areas and volumes can also be computed analytically from this surface representation.

Fig 1. Molecular surface envelope (light blue lines) generated by rolling a probing ball on the surface of atoms represented by a group of spheres of van der Waals radii.

In the mid to late 1980s Honig and colleagues had developed accurate numerical methods for calculating the total electrostatic energy of molecules of arbitrary shape and charge distribution[17]. These methods account for both Coulombic and solvent polarization terms. In addition to the solvation energies of individual molecules, these methods can be used to calculate the electrostatic energy associated with conformational changes in proteins as well as changes in solvation energy that accompany the binding of charged substrates. Parallel to this development, the linearized Poisson Boltzmann equation was also used to derive electrostatic interactions between pairs of atoms in proteins[18]. This equation can be solved accurately by a method that takes into account the detailed shape of the protein. Methods for computing electron density[19] and hydrophobic potential[20] had also been developed in the early 1990s.

The surface quantities are represented by a smooth multi-color three-dimensional contour[20] and they can be displayed in popular softwares such as SYBYL and InsightII etc. In addition to other features hydrogen bonds in a molecular system can also be displayed. The hydrogen bonds are detected by scanning the distance between each of potential hydrogen donor and acceptor pairs. Any pair whose separation is less than 3.5 Å is regarded as forming a hydrogen bond.

LIGAND-PROTEIN DOCKING

With the increasing computer power, efforts have been directed towards the development of automated and effective programs for docking a ligand to its receptor sites[21-23].  The first work started in 1982 by Kuntz group[24]. They used a geometric approach to place and orient a ligand in a cavity on a protein. Algorithms based on shape complementarity were designed to examine many binding geometries and evaluate them in terms of steric overlap. This strategy has since been improved and applied to various problems such as systematic screening and drug lead optimization[12,25-27]. The methodology can be described as follows.

In a real ligand-protein binding process, a ligand binds at a site in a cavity of protein. This binding is similar to inserting a lock to a key, except that both parties can adjust its shape upon binding (induced fit) (Fig 2, 3).

Fig 2. A cavity in an enzyme within which its inhibitor binds.

Fig 3. The inhibitor of this enzyme fits into this cavity by a lock and key mechanism.

This binding process can be simulated by a 3-step procedure (Fig 4):

1. Creation of a model of cavity by a cluster of overlapping spheres that fills the cavity.

2. Matching and orient the ligand to the cavity model through comparison between ligand atom position and individual spheres. The best matched ones are selected.

3. The selected ones are rated for chemical as well as geometric complementarity within the cavity.

In Fig 4, to give a better view of the sphere cluster, each sphere is shown as a much smaller sized ball.

Fig 4. The three steps of ligand-protein docking.

The sphere cluster is generated based on the surface feature of a cavity. The program SPHGEN, from DOCK suit of programs[25,27], can generate sphere clusters using surface points derived from the software MS (available from Quantum chemistry program exchange) or MidusPlus[28,29].

To evaluate chemical complementarity of docked ligand-protein systems, various scoring schemes have been created[25,27,30-33].  In these schemes, ligand-protein interaction is evaluated by a grid-based and often simple energy function which ensures the speed for screening large number of compounds to find potential lead compounds that can be docked to a particular receptor.

In view that ligand-protein interaction involves induced fit, much effort has been paid throughout the 1990s to the development of algorithms for flexible docking. One approach has been to use Monte Carlo simulation and simulated annealing to sample ligand flexibility[29,30].  In this approach, a molecular mechanics force field is used to orient the ligand and search for its low energy conformation. The protein pocket is either held rigid or allowed with limited flexibility. The disadvantage of this approach is that it is slow to identify global minima and it can spend significant time in exploring local minima.

In an alternative approach, fragment or incremental construction algorithms have been developed[34-36]. This method divides a ligand into modular pieces and then dock them separately. These pieces are then joined together in the cavity. Applications of this method[12,35-37] showed that this method was relatively efficient and accurate. A drawback is that the initial docking of fragments necessarily reduces the amount of information for fitting the whole ligand to the cavity. In addition the division of the fragments is not unique and it can significantly influence the final docking result[37].

Genetic algorithm has also been applied to flexible docking[38,39]. In this approach, ligand conformations and orientations are represented as populations. Using a series of operations resembling genetic crosses and mutations, followed by selection according to a scoring function, increasingly favorable populations of possible binding structures are propagated. The quality of the result depends on the starting genes, number of mutations and crossover operations, and the quality of scoring function.  This approach gives fairly accurate results and there is progress in improving the speed of genetic algorithm computation[32] .

A ligand-protein inverse docking method INVDOCK[40] was recently introduced by Chen et al as a computer method for identification of potential protein targets of a drug. It has been shown the potential of this approach in: (1) identification of unknown and secondary therapeutic targets of a drug, (2) prediction of potential toxicity and side effect of an investigative drug, (3) probing molecular mechanism of bioactive herbal compounds such as those extracted from plants used in traditional medicines[41-43].

There are other methods that exploit various chemical properties to do efficient docking. The program Hook uses the potential ligand-protein hydrogen bonding sites as a guide to orient and dock a ligand[44].  The software LUDI positions molecules into a cavity so that hydrogen bonds are formed and hydrophobic pockets are filled with hydrocarbon groups[45]. GenStar generates chemically reasonable structures from sp3 carbons to fill the binding site[46]. The distribution of partial charges is also used in a Poisson-Boltzmann approach for flexible docking[47].

MODELING OF LIGAND-NUCLEIC ACID INTERACTIONS

Interactions of small molecules with DNA have been studied for several decades in the hope of learning the principles for targeting specific DNA sequences to control gene expression and design better anticancer drugs[48]. More recently attention is being paid to the modeling of ligand-RNA interactions with the realization that RNA is a promising therapeutic target[49,50].

Early work was concentrated on the study of the interaction of intercalating drugs with DNA. In later 1970s several groups used empirical potential functions to compute the interaction energy between a DNA minihelix and cationic intercalators such as ethidium, profavin, and 9-aminoacridine[51]. In mid 1980s, Kai-xian CHEN et al conducted a series of studies to investigate the structural and energetic factors involved in the sequence selective binding of daunomycin to various double stranded hexanucleotides[52].  They were able to obtain accurate interaction energies and conformational energy changes by employing the SIBFA procedure, which uses empirical formulas based on ab initio SCF computations. Since then, molecular mechanics with improved force fields has been used to compute interaction energy between DNA and various drugs including both intercalators and groove binders[53,54].

The modeling of the thermodynamics of ligand-DNA binding has received a lot of attention in the 1990s. Kollman and colleagues employed a free energy perturbation/molecular dynamics approach to compute the free energy differences between ligand-DNA complexes having different base pair sequences[55]. The results on acridine and daunomycin generally reproduce the sequence dependent binding preference observed experi-mentally. Chen and Prohofsky applied a modified self-consistent phonon approach of a harmonic lattice dynamics algorithm to compute the equilibrium binding constant of daunomycin and netropsin to DNA[56,57] and the effect of drug binding on DNA melting[58]. In their approach, drug motions are decomposed into individual group motions involving bond breaking, bond rotation and translational displacement. These motions can be modeled at atomic level by a harmonic oscillations and the thermodynamics can be solved based on lattice dynamics theory originally introduced in condensed matter physics. In addition, normal mode computation was also used to probe the dynamics and interpret experimental findings of ligand-DNA binding[59,60].

Aimed at the design of novel RNA binding therapeutic agents, attention has recently been focused on the extension of ligand-protein docking algorithms to solve ligand-RNA docking problems.  Although still in an early stage, some useful results have been developed. For instance, in 1998 Leclerc and Cedergren combined 3D-SAR with a docking protocol to determine bound conformations of aminoglycosides which associate with the Rev-binding element (RBE) RNA[61].

PROTEIN-PROTEIN AND PROTEIN-DNA INTER-ACTIONS 

Protein-protein and protein-DNA interactions play central roles in regulatory and metabolic processes in living systems.  Structural information for macromolecular binding is the key in understanding these interactions and the effect on function. Given the disparity between the number of solved protein structures and that of protein-protein and protein-nucleic acid complexes, development of predictive protein docking methods has become a focus in the field[62]. A major development of the past few years has been two blind trials of protein docking methodologies as well as other protein modeling methodologies. One is the Alberta challenge which includes the binding of beta-lactamase and its inhibitory protein[63]. The other is the Comparative assessment of protein structure prediction 2 (CASP2) which involves a hemagglutinin-antibody complex[64].

A widely used approach for rigid docking between proteins is the DOCK algorithm[65], which is essentially similar to that of ligand-protein docking.  To facilitate the determination of potential binding regions, several algorithms have been designed for quickly finding the contacting surfaces of the two binding proteins[62]. These algorithms are based on a sparse critical points method or a grid based representation of molecular surface representation[66].

Simplified representations of protein geometry have also been used by several groups to reduce sensitivity to small perturbations in conformation[62].  Janin and coworkers replaced amino acids with spheres of various radii and performed docking to maximize the buried surface[67]. This is followed by structural refinements using Monte Carlo simulated annealing and energy minimization. Duncan and Olson used spherical harmonics to describe the protein surface and optimizes the geometric complementarity of docking based on genetic algorithm and Monte Carlo simulated annealing[63]. Webster and Rees applied graph theory to molecular docking whereby the core of a molecule is enclosed in an ellipsoid and the remaining parts are possible binding sites[68].  Docking is facilitated by the graphs connecting the ellipsoid surface and the surfaces of the binding proteins followed by evaluation of nonbonded energy. These methods were used in the two docking challenges[62-64].

Another method that has generated substantial interest is Fourier correlation algorithm[69]. This method uses a grid-based molecular representation strategy. Each grid node is assigned a numerical value corresponding to molecular surface, protein core or empty space. Surface complementarity is determined by calculating the Fourier correlation of two grids. The relative orientations are changed and the process repeated until rotational space has been completely searched. This approach has since been improved and applied in the Alberta challenge[62,64].

Efforts have also been directed towards the inclusion of molecular flexibility in protein-protein docking. For instance, energy minimization has been introduced to putative docked structures to relieve the steric clashes[64,66]. Monte Carlo and simulated annealing has also been used to dock proteins[61]. In this approach, a flexible protein ligand moves within a precalculated grid of atomic potentials, while the protein receptor is kept stationary. This limits the otherwise enormous search space.  Further along this line, a global optimization method was introduced which involves a combined Brownian and Monte Carlo followed by an energy minimization[70].

Protein-nucleic acid interactions generally involve highly charged groups and substantially more flexible conformation changes than that of protein-protein interactions. As a result, there has been few attempts in protein-nucleic acid docking. Knegtel and colleagues developed the MONTY algorithm to study protein-DNA docking[71]. In their approach, Monte Carlo simulations are used to explore binding space. Both sidechain and DNA helix flexibility are taken into consideration. Possible binding complexes are selected based on a simplified energy scoring function. Sternberg and colleagues extended their FTDOCK algorithm to repressor-DNA docking[62]. They adopted a Fourier correlation algorithm combined with modified parameters for charge specificity in protein-DNA binding. The unbound form of the protein was docked to a model DNA containing the target sequence. Possible complexes are ranked by an empirical protein-DNA pair potential.

CONCLUSION

Computational methods for biomolecular interactions are important tools in facilitating the understanding of the mechanism of biological processes and in rational design of drugs and novel proteins.  We have witnessed tremendous growth in this field as evidenced by a large number of publications and new software algorithms developed in the last few years. Molecular modeling is widely used in the study of the mechanisms of drugs[72-77] and has contributed to the design of several drugs and lead compounds[2,4,13].  New modeling methods are emerging along with other new developments in biomedical science and biotechnology. Applications of combinatorial chemistry and high-throughput screening in drug discovery processes[78,79] are now pushing the development of tools for molecular diversity and the design of diverse libraries[80].  Accumulation of functional information has begun to reach the level that makes it possible to model biomolecular systems in the cellular and tissue environments[81]. The human genome project and the generation of more and more structural and functional information from other experimental means will further demand and accelerate the development of molecular modeling.

REFERENCES