the newsletter of
collaborative
computational project 1

Implementation of Hybrid Methods to Model Electronic Structure of Large Molecular Systems

Report on EPSRC Grant GR/K88286/01

Richard J. Hall
Department of Chemistry, University of Manchester, Manchester M13 9PL.
R.Hall@man.ac.uk

Background

In spite of considerable computational and algorithmic advances in recent years, ab initio electronic structure calculations of chemical accuracy are still limited to systems of about 10 - 20 heavy atoms. To allow meaningful calculations to be carried out on considerably larger systems, there is considerable international activity to develop and apply a variety of so-called hybrid or embedding schemes, where only the chemically important part of the system, such as the catalytic centre, is treated at a high level of theory. The remainder of the system is treated at an appropriate lower level of theory, employing, for example, classical models with empiric force fields or continuum treatments. Hybrid methods are of particular value in condensed phase structure and reactivity studies, such as enzyme catalysis, solid state and surface reactivity, areas which embrace interdisciplinary chemical applications. There are a number of leading groups in the UK developing and using hybrid methods, and the mission of this research programme was to provide computational software, of general applicability, to ensure the UK remains at the leading edge in this important area.

The software developments were carried out by the author at Manchester University, and involved collaboration with the permanent staff of the Daresbury Laboratory. In addition, these latter staff carried out other software developments of value to the UK quantum chemistry community.

Software Development at Manchester

The determination of potential energy surfaces, particularly minima and saddle points (corresponding to transition states) is central to modelling chemical reactivity. There is no generally available code that allows such calculations to be carried out using a hybrid formalism. A major part of the work at Manchester involved the development of computer code to permit the geometry optimisation within hybrid schemes, and an investigation of the best way to treat the junction between the two regions.

Geometry Optimisation for QM/MM Methods

Most hybrid quantum mechanical (QM)/molecular mechanical (MM) models involve a relatively small and large number of atoms in the QM and MM regions, respectively. The traditional way to optimise large MM systems, such as those found in the biochemistry area, is to utilise conjugate gradient (CG) techniques1. Such algorithms may take may thousands of cycles to converge, but the computation of the energy and gradient is computationally undemanding. On the other hand, computation of these quantities for a QM system is computationally intensive and in such cases the use of quasi-Newton (QN) algorithms will provide a more rapid and hence more attractive solution1. Such a QN scheme will traditionally demand storage of the hessian matrix and thus can become impractical for very large systems, although it should be noted that algorithms involving linear storage of the hessian are available2. We have combined the efficiency of the QN algorithm with the simplicity of the CG algorithm when designing a scheme which allows for the optimisation of both the QM and MM regions in a hybrid model. We have developed such a program suite initially targeted at the treatment of macromolecular systems. However, subsequently its use for a range of reactivity problems has been demonstrated.

Within the QN method, a good choice of coordinate system can dramatically improve optimisation performance3. We have chosen to implement the delocalised coordinates of Baker4 as our default coordinate scheme since they are much simpler to generate than natural internals. One can improve the stability of an optimisation by performing a partial line search, a minimum being sought along a polynomial built from the energy and gradient at the current and previous optimisation step, which is then used in determining the next optimisation step5. Another technique making use of information from previous cycles is the geometry DIIS (GDIIS) method6. Here the DIIS matrix is built using gradient information from up to five previous points, the coefficients obtained from diagonalisation and normalisation of the matrix being used to interpolate the current gradient and geometry. Both the partial line search and GDIIS methods have been implemented in our current scheme. The location of saddle points on the potential energy surface is, of course, central to understanding reactivity phenomena. Out of the number of algorithms that are available for the location of saddle points on a potential energy surface we have chosen to implement the partial rational function algorithm which searches uphill in one direction while simultaneously minimising all other modes7. When performing such a search, the default BFGS update to the hessian8 is not appropriate, since this formula preserves a positive definite matrix, so that instead, we use the Powell update procedure9. Furthermore, since the nature of the potential energy surface needs to be well characterised in order to locate saddle points, it is prudent to use an analytically determined hessian matrix as the starting point for the update procedure. Such a matrix will contain a more accurate description of the local surface than that obtained from a simple molecular mechanical force field which is the default mode10.

For optimisation of the MM region we use one of the many freely available CG algorithms. Our current method uses an essentially unchanged version of CONMIN11, a FORTRAN program in subroutine form, for constrained optimisation problems. Optimisation is performed in cartesian coordinates, the routine calling an MM programme which returns the energy and gradient of the system, with the gradients on the QM region set to zero, so as to keep that region fixed.

To summarise, a large ("outer") region is treated using a conjugate gradient algorithm and is allowed to relax either for N steps (typically N=500) or until suitable convergence is achieved. A quasi-Newton algorithm is then used to perform one optimisation cycle on a small ("inner") part of the system, for example the active site of an enzyme. The outer region is then allowed to relax for a further N cycles, and this procedure is repeated iteratively until predefined convergence criteria are reached.

Link Atom Treatment

One of the key issues in hybrid calculations is how to treat bonds across the QM-MM junction. There is no unique way to terminate the "inner" (QM) region needed to preserve valency during the calculation of the QM energy and gradient contributions. The different methods may have different effects on the subsequent optimisation of the QM region. Typically one adopts an approach where the total system is truncated across a covalent bond and the dangling bonds are capped with hydrogen "link" atoms. (Figure 1)


FIG 1. QM/MM partitioning scheme. The full system (a) is partioned using (b) link atoms

In the layered schemes introduced by Morokuma et al12, any force on Hlink is projected onto the adjacent CMM and CQM atoms by means of a Jacobian matrix, calculated by constraining the link atom to lie along the C-C bond. If such a model is well behaved, the optimised C-C bond length will lie somewhere between the value optimised at the low (MM) and high (QM) level. An alternative approach is used in our two region QM/MM scheme. Here, there is explicit polarisation of the QM region by the MM region absent in the Morokuma method. The force on a QM atom will contain a term arising from the presence of the outer MM region. This force can be further decomposed into terms arising from point charges and Van der Waals interactions. There is also a force arising from interaction with the link atom, which cannot be rigorously removed.

In order to gain further insight into what effect the introduction of link atoms has on a two region hybrid calculation we have optimised the geometry of a number of amino acid analogues, subject to a variety of constraints. The molecules were first optimised at the HF/6-31G* level. Each molecule was then partitioned into a QM and an MM region along a C-C bond, the QM region being terminated with either a hydrogen atom or a localised orbital. With the MM geometry kept fixed, the QM region was allowed to fully optimise, subject to a set of constraints, each involving the assignment of an infinite mass to one or more atoms, to ensure their position remains fixed throughout the subsequent optimisation steps. This is achieved by following the method of Pulay and Fogarasi13. So if, for example, we wish to freeze a particular atom, CMM, we can assign an infinite mass to that centre, to ensure that CMM will remain fixed in space and thus act as a pivot around which the QM region will relax. If necessary, additional MM centres can be included as dummy atoms of infinite mass, in order to create a reference framework for optimisation.

We have investigated a number of approaches that use link atoms, which are fully described in reference 14. We find that a localised self consistent field formalism15, whereby the bond across the QM/MM junction is described using a localised bond orbital, performs more impressively than a link atom approach. The best link atom approach is when CMM is assigned an infinite mass and in addition the distance between the QM and MM region is kept fixed, and the H link atom is constrained to lie along the CQM - CMM bond (see Figure 1). Such an approach removes any artificial compression across the junction arising from the use of a hybrid methodology.

Code Distribution

One of the most important considerations running throughout the project lifetime has been the care taken to ensure ease of use by all members of the community. A number of effective collaborations have taken place over the three year period and the modular design of the program means that it has been easily incorporated by members of CCP1 into their own codes. As well as interfacing to the GAUSSIAN and AMBER programs used by the group of Hillier at the University of Manchester and by others, collaboration with workers at Daresbury Laboratory has led to a version of the code which can be used within the CHEMSHELL environment. In addition, a port of the code has been carried out to allow its use within the GRACE QM/MM scheme of Williams (University of Bath).

Scientific Achievements

As with all software development projects, most of the actual applications will be in the future. However, to date there have already been a number of scientific applications which have demonstrated the value of the software. The majority have been within the area of modelling enzyme catalysis where it is often difficult to locate stationary structures, particularly transition states, since the chemical reaction involves rearrangement of a large number of atoms of both substrate and residues within the active site. A substantial number of such reactions have been studied; for example kinases, chorismate mutase, thymidine phosphorylase, lactate dehydrogenase, xylose isomerase. (See Publication List).

In protein kinases and lactate dehydrogenase, the calculations have been able to distinguish between alternative mechanisms. In many enzymic reactions, as exemplified by chorismate mutase and thymidine phosphorylase it is essential to consider the motion of the enzyme during the course of the reaction, since this is often an important contributor to the catalytic activity. In the case of chorismate mutase we predict that the barrier for conversion of chorismate to prephenate is reduced from 29.1 kcalmol-1 in the gas phase to only 1.4 kcalmol-1 at the active site of chorismate mutase (Figure 2). Such a reduced barrier and increased exothermicity (from 10.1 kcalmol-1 to 21.6 kcalmol-1) is reflected in an earlier transition state, with a reduced length of the breaking bond (C4 - O3). We find that reactant compression suggested by other workers is not a contributor to the catalysis. A potentially important feature is the enhanced interaction of the carboxylate groups and ether oxygen with neighbouring arginine residues, arising to a large extent from the changing shape of the substrate as the reaction proceeds. Optimisation of both the QM and MM regions is essential to identify these changing interactions.


Fig 2: Transition structure in active site of chorismate mutase

An important current problem in enzyme catalysis is the role of hydrogen tunnelling, in particular the effect that enzyme dynamics has on this process. The code developed by this project is being extremely valuable in tackling this problem, preliminary results for xylose isomerase having recently been reported.

A real value of the present software developments is the flexibility for their use within a range of hybrid models. Thus, in the field of organic reactivity, it has been used to study stereospecificity in reactions of allylstannanes with aldehydes. Reactions at the water/solid interface can be modelled using a hybrid model that consists of a cluster embedded in point charges with surface solvation included by a continuum description. This model, which includes the optimisation code developed here, has been used to study the structure and stability of galena at the interface with aqueous solution.

Outside the field of hybrid models, collaboration with the research group of Michiel Sprik in Cambridge has resulted in a port of the optimisation code to allow its use within the Car Parrinello Molecular Dynamics (CPMD) code which is central to the next CCPI proposed project, the development of the use of CPMD to solve chemical problems.

References

  1. Fletcher, R. Practical Methods of Optimisation Vol 1; John Wiley and Sons: New York.
  2. Fischer, T.H., Almlöf, J. J Phys Chem 1992, 96, 9768.
  3. Baker, J. J Comp Chem 1993, 14, 1085.
  4. Baker, J., Kessi, A., Delley, B. J Chem Phys 1996, 105, 192.
  5. Schlegel, H.B. J Comp Chem 1982, 3, 214.
  6. Császár, P., Pulay, P. J Mol Struct 1984, 114, 31.
  7. Baker, J. J Comp Chem, 1986, 7, 385.
  8. Broyden, C. J Inst Math Appl, 1970, 6, 76.
  9. Powell, M.J.D. Math Comp 1971, 1, 26.
  10. Schlegel, H.B. Theor Chim Acta 1984, 66, 333.
  11. Shanno, D.F., Phua, K.H. Conmin, 1980 available for download at http://www.netlib.org/toms/500.
  12. Dapprich, S., Komáromi, I., Byun, K.S., Morokuma, K., Frisch, M.-J. J Mol Struct (THEOCHEM) 1999, 461, 1.
  13. Pulay, P., Fogarasi, G. J Chem Phys 1992, 96, 2856.
  14. Hall. R.J., Hindle, S.A., Burton, N.A., Hillier, I.H. Aspects of hybrid QM/MM calculations: The treatment of the QM/MM interface region and geometry optimisation with an application to chorismate mutase. J Comp Chem (in press).
  15. Assfield, X., Rivail, J.-L. Chem Phys Lett 1996, 263, 100.

Presentations

Results of the project have been reported at a number of international meetings, in the form of oral presentations and poster sessions.

  • Molecular Graphics and Modelling Society meetings, Southampton 04/99; York 04/00
  • Quantum Bioinorganic Chemistry (QBIC-99), Warwick, 07/99
  • 5th World Congress of Theoretically Oriented Chemists (WATOC-99), Imperial College 08/99
  • Computational Biophysics 2000, Nice, 06/00.

Publications which have so far resulted from this project are listed on the IGR. Reference 14, which describes the software development of this project, is an invited contribution to a special issue of Journal of Computational Chemistry on "Quantum Chemical Methods for Large Molecules".

previous contents forward
design by CCP1, March 2001