Reaction mechanism of metalloenzymes studied by

using computers and software, have become more and more important in the study of enzyme catalytic mechanisms. With theoretical methods, we can constr...

0 downloads 89 Views 1MB Size
Reaction mechanism of metalloenzymes studied by theoretical methods

Geng Dong

DOCTORAL DISSERTATION by due permission of the Faculty of Science, Lund University, Sweden. To be defended on 31st May 2018, at 9:00 in lecture hall B, Centre for Chemistry and Chemical Engineering, Lund. Faculty opponent Prof. Maria João Ramos, University of Porto, Portugal Grading committee Dr. Marcus Lundberg (Uppsala University) Prof. Christine McKenzie (University of Southern Denmark) Assoc. Prof. Yaoquan Tu (Royal Institute of Technology)

i

Organization LUND UNIVERSITY

Document name DOCTORAL DISSERTATION

Centre for Chemistry and Chemical Engineering P.O. Box 124, SE-221 00, Lund, Sweden.

Date of issue 2018-05-04

Author(s) Geng Dong

Sponsoring organization

Title and subtitle Reaction mechanism of metalloenzymes studied by theoretical methods Abstract Metalloenzymes catalyse a wide variety of reactions in nature. In the thesis, I have studied the reaction mechanism of three metalloenzymes, viz. [NiFe] hydrogenase (H2ase), dimethyl sulfoxide reductase (DMSOR) and formate dehydrogenase (FDH), by theoretical methods, namely quantum mechanics (QM), combined quantum mechanical and molecular mechanics (QM/MM), as well as QM/MM thermodynamic cycle perturbation (QTCP). For H2ase, we have studied the protonation states of the four cysteine residues in the active site at four intermediate states, the H2 binding site and the full reaction mechanism. Our results demonstrate that the Cys546 residue is most easily protonated by 14−51 kJ/mol, H2 binding to Ni ion in singlet state is most favourable by at least 47 kJ/mol, and the Ni-L state is not involved in the reaction mechanism. For the H2 binding, we have calibrated density-functional methods with advanced QM methods, like CCSD(T), DMRG-CASPT2 and CASsrDFT. For DMSOR, we have studied the effect of the protein ligand in reaction mechanism. Our results indicate that enzymes with ligand with a single negative charge (serine, cysteine, selenocysteine, SH– and OH–) are predicted to have two-step reaction mechanisms, giving an activation energy of 69−85 kJ/mol. However, the O2– and S2– ligands gave much higher activation energies of 212 and 168 kJ/mol. For FDH, we have studied the reaction mechanism. Our results indicate that the substrate formate does not coordinate directly to Mo ion when it enters the oxidised active site of FDH, but instead resides in the second coordination sphere. The sulfido ligand abstracts a hydride from substrate, giving a Mo(IV)−SH state. Finally, the CO2 will be released when the active site is oxidised by two electrons.

Key words Metalloenzyems, reaction mechanism, QM/MM, DFT, DMRG, CAS-srDFT, big-QM, QTCP Classification system and/or index terms (if any) Supplementary bibliographical information

Language English

ISSN and key title

ISBN 978-91-7422-585-3 (pdf) 978-91-7422-584-6 (print)

Recipient’s notes

Number of pages 116

Price

Security classification

I, the undersigned, being the copyright owner of the abstract of the above-mentioned dissertation, hereby grant to all reference sources permission to publish and disseminate the abstract of the above-mentioned dissertation.

Signature

Date

2018-04-18

ii

Reaction mechanism of metalloenzymes studied by theoretical methods

Geng Dong

iii

Funding information: This thesis work was financially supported by China Scholarship Council, the Swedish Research Council and COST through Action CM1305 (ECOSTBio).

Faculty of Science Division of Theoretical Chemistry ISBN 978-91-7422-585-3 (pdf) 978-91-7422-584-6 (print) Printed in Sweden by Media-Tryck, Lund University Lund 2018

Media-Tryck is an environmentally certified and ISO 14001 certified provider of printed material. Read more about our environmental work at www.mediatryck.lu.se

iv

For Xiaoqiong

v

vi

Contents List of Publications ........................................................................................... viii Popular Science Summary .................................................................................. xi 1 Introduction ...................................................................................................... 1 1.1 [NiFe] Hydrogenase .................................................................................. 1 1.2 Dimethyl Sulfoxide Reductase .................................................................. 3 1.3 Formate Dehydrogenase ............................................................................ 4 2 Theory .............................................................................................................. 6 2.1 Quantum Mechanics .................................................................................. 6 2.1.1 The Born−Oppenheimer Approximation............................................ 7 2.1.2 Spin and the Pauli Exclusion Principle............................................... 7 2.1.3 Slater Determinants ............................................................................ 8 2.1.4 Hartree−Fock Method ........................................................................ 8 2.1.5 Basis Sets .......................................................................................... 10 2.1.6 Post-HF Methods .............................................................................. 11 2.1.7 Density Functional Theory ............................................................... 15 2.1.8 Multiconfigurational Short-Range DFT Method .............................. 17 2.2 Molecular Mechanics Methods ............................................................... 18 2.3 QM/MM Method ..................................................................................... 20 2.4 Big-QM Approach ................................................................................... 21 2.5 QTCP Method ......................................................................................... 22 3 Summary of the Articles ................................................................................. 24 3.1 Paper I...................................................................................................... 24 3.2 Paper II .................................................................................................... 25 3.3 Paper III ................................................................................................... 26 3.4 Paper IV ................................................................................................... 27 3.5 Paper V .................................................................................................... 28 3.6 Paper VI ................................................................................................... 29 4 Conclusions and Outlook ............................................................................... 30 Reference ........................................................................................................... 31 Acknowledgements ........................................................................................... 34

vii

List of Publications This thesis is based on the following papers, which are found at the end of the thesis. They are referred to by Roman numerals. I.

Protonation states of intermediates in the reaction mechanism of [NiFe] hydrogenase studied by computational methods. Dong, G. & Ryde, Journal of Biological Inorganic Chemistry, 2016, 21, 383−394.

II.

H2 binding to the active site of [NiFe] hydrogenase studied by multiconfigurational and coupled-cluster methods. Dong, G., Phung, Q. M., Hallaert, S. D., Pierloot, K. & Ryde, U. Physical Chemistry Chemical Physics, 2017, 19, 10590−10601.

III.

Exploration of H2 binding to the [NiFe] hydrogenase active site with multiconfigurational density functional theory. Dong, G., Ryde, U. Jensen, H. J. Aa. & Hedegård, E. D. Physical Chemistry Chemical Physics, 2018, 20, 794−801.

IV.

The full reaction mechanism of [NiFe] hydrogenase studied by computational methods. Dong, G., Phung, Q. M., Pierloot, K. & Ryde, U Manuscript

V.

Effect of the protein ligand in DMSO reductase studied by computational methods. Dong, G. & Ryde, U. Journal of Inorganic Biochemistry, 2017, 171, 45−51.

VI.

Reaction mechanism of formate dehydrogenase studied by computational methods. Dong, G. & Ryde, U. Manuscript

viii

List of papers not included in this thesis 1.

U. Ryde, G. Dong, J. Li, M. Feldt, R. A. Mata (2016) "Computational studies of molybdenum and tungsten enzymes" In Molybdenum and Tungsten Enzymes, Eds. R. Hille, M. Kirk, C. Schulzke, RSC Metallobiology series no. 7; pp. 275−321. Book Chapter.

2.

M. Misini Ignjatović, O. Caldararu, G. Dong, C. Muñoz-Gutierrez, F. Adasme-Carreño, U. Ryde, (2016) "Binding-affinity predictions of HSP90 in the D3R Grand Challenge 2015 with docking, MM/GBSA, QM/MM, and free-energy simulations", J. Comp.-Aided Mol. Design, 30, 707−730

3.

G. Dong, U. Ryde (2016) "O2 activation in salicylate 1,2-dioxygenase: A QM/MM study reveals the role of His162", Inorg. Chem., 55, 11727−11735

4.

G. Dong, L. Cao, U. Ryde (2018) "Insight into the reaction mechanism of lipoyl synthase: A QM/MM study, J. Biol. Inorg. Chem, 23, 221−229

5.

G. Dong, U. Ryde (2018) "A broken-symmetry DFT study of spin states of noncubane [4Fe–4S] clusters", manuscript

ix

List of article contributions I.

I performed most of the calculations. I participated in discussion and commented on the manuscript.

II.

I performed most of the calculations. I participated in the data analysis and wrote the first version of the manuscript.

III.

I performed most of the calculations. I participated in the data analysis and wrote the first version of the manuscript.

IV.

I performed most of the calculations. I participated in the data analysis and wrote the first version of the manuscript.

V.

I performed most of the calculations. I participated in the data analysis and wrote the first version of the manuscript.

VI.

I performed most of the calculations. I participated in the data analysis and wrote the first version of the manuscript.

x

Popular Science Summary Metalloenzymes are proteins that contain one or more metal ions bound to protein. They constitute about one-third of all enzymes known so far and they often perform hard chemical reactions involving small substrates, like H2 and N2. In our research, we focus on those metal ions that are located in the active sites and perform redox reactions, i.e. involving electron transfer. Why enzymes? As we know, human benefit from enzymes. For example, more than 700 types of enzymes exist in our body, and O2, which is necessary to people, is generated by enzymes. However, the enzymes are complicated and very difficult to understand. In this thesis, theoretical methods were used to investigate enzymes. Why theoretical methods? Enzymatic reactions are generally fast, so the details of the reaction are hard to study experimentally. Theoretical methods, using computers and software, have become more and more important in the study of enzyme catalytic mechanisms. With theoretical methods, we can construct models to mimic the reaction, so that we can understand the reaction in atomistic details, e.g. electron and proton transfer, bond cleavage and formation, etc. These findings from theoretical studies can then be used in the experimental studies. Here, we study three metalloenzymes, viz. [NiFe] hydrogenase, dimethyl sulfoxide reductase (DMSOR) and formate dehydrogenase. [NiFe] hydrogenases catalyse the reversible formation of hydrogen molecules from protons and electrons. This very simple reaction has attracted much interest because H2 may be used as clean and renewable energy carrier. In DMSOR, the reduced enzyme reacts with dimethyl sulfoxide (DMSO) to generate dimethyl sulfide (DMS). This enzyme is interesting because the molybdenum (Mo) is the only known second-row transition metal that employed by proteins, and Mo enzymes exist in almost all organisms and they are involved in the metabolism of many biological systems. Finally, the formate dehydrogenases (FDHs) can react with formate to generate carbon dioxide reversibly. This reaction is a key part of biological transformations of carbon dioxide (CO2) in the global carbon cycle. These enzymes are very interesting and play important roles in nature. However, the reaction mechanisms are still not fully understood. In this thesis, we explored the details of the reaction mechanism for the three enzymes with theoretical methods.

xi

xii

1 Introduction Metalloenzymes exist in all forms of life in nature and play essential roles in the function of all organisms. In this doctoral thesis, three metalloenzymes were studied, viz. [NiFe] hydrogenase (H2ase), dimethyl sulfoxide reductase (DMSOR) and formate dehydrogenase (FDH), using theoretical methods, namely quantum mechanics (QM), combined quantum mechanical and molecular mechanics (QM/MM), big-QM, as well as QM/MM thermodynamic cycle perturbation (QTCP). The background of the three enzymes will be briefly discussed in this section.

1.1 [NiFe] Hydrogenase Hydrogenases are metalloenzymes that catalyse the reversible conversion of protons and electrons to H2 molecules. In nature, three types of hydrogenases are found, categorised according to the metals in their active sites, viz. [Fe], [FeFe] and [NiFe] hydrogenases. Here, we focus on the [NiFe] hydrogenases. Standard [NiFe] hydrogenases are composed of two subunits: the large and small subunits.1 The [NiFe] active site is located in the large subunit. As shown in Figure 1, the iron ion is coordinated by one carbon monoxide and two cyanide molecules. In addition, two thiolates from Cys84 and 549 bridge the two metals (the residues are numbered according to the enzyme from Desulfovibrio vulgaris Miyazaki F).2 The nickel ion has two additional cysteine ligands (Cys81 and 546) that are terminally coordinated. The small subunit harbours three FeS clusters in an electron transfer chain, viz. the proximal [4Fe4S], medial [3Fe4S] and distal [4Fe4S] clusters.

Figure 1. The active site of [NiFe] hydrogenase. 1

In the catalytic cycle of [NiFe] hydrogenase, a number of intermediates have been identified experimentally.3 The reaction starts from an electron paramagnetic resonance (EPR)-silent state, called Ni-SIa state, in which the Ni ion is in the +II oxidation state, without any extra ligands. The Fe ion is supposed to remain in the low-spin +II throughout the reaction. Then, one electron and one proton are added to the active site to generate Ni-C state. In the Ni-C state, a hydride ion bridges two metals,4,5 and the Ni ion is oxidised to +III. Next, another H+/e– pair is added, resulting in the Ni-R state. In the Ni-R state, the hydride and proton react to produce a H2 molecule, but it still binds in the active site. Finally, the H2 molecule is released, regenerating the Ni-SIa state to start a new reaction cycle. Recently, a species namely Ni-L state, which is generated from Ni-C state, was suggested to be an intermediate that may be involved in the catalytic cycle.6,7 Based on these studies, a tentative reaction mechanism has been suggested, as shown in Figure 2.

Figure 2. The putative reaction mechanism of [NiFe] hydrogenases. In the catalytic cycle, two protons and two electrons are required to generate one hydrogen molecule. However, the complete proton transfer mechanism of [NiFe] hydrogenase has not been elucidated. Mutation studies showed that the protons are transferred from the terminal cysteine at the active site to a nearby glutamate residue (Glu34 in Figure 1).8 Also, a recent crystallographic study indicated that in an almost (96%) pure Ni-R state, the Cys546 is protonated.4 As we can see from Figure 2, the binding site of hydrogen molecule in active site of [NiFe] hydrogenase is unclear. Experimentally, carbon monoxide, which is a competitive inhibitor of H2, binds to Ni.9 Xenon-binding experiment showed a binding path that also ends at Ni ion.10,11 In contrast, the Fe ion is suggested as binding site of H2 from an organometallic perspective.12 Likewise, theoretical studies have given varying results when different methods and models were used (see Table 1 in paper II). From previous theoretical studies, we can conclude that the energies are very sensitive to the size of the QM

2

region and the DFT methods. Therefore, advanced methods are required to investigate the H2 binding mode. Finally, the details of the hydrogen evolution reaction did not coincide in recent published theoretical studies.13,14 The main difference is whether an extra intermediate exits between the Ni-Sia and Ni-R states, or not. In addition, the Ni-L state was found by experimental studies in dark environment.6,7 However, it is still unclear Whether it is involved in the catalytic cycle. In this thesis, we have investigated the protonation states of the four cysteines in the active site (paper I), the H2 binding site (papers II and III), and the full reaction mechanism (paper IV) with our theoretical methods.

1.2 Dimethyl Sulfoxide Reductase Dimethyl sulfoxide reductases (DMSOR) are enzymes that reduce DMSO to DMS by abstracting the oxygen atom of the substrate to a Mo ion (shown in Figure 3).15,16 In this process, two electrons are transferred from the Mo ion to the oxygen atom, resulting in the change of oxidation state from +IV to +VI.

Figure 3. The overall reaction of DMSOR. The active site of reduced DMSOR contains two molybdopterin (MPT) cofactors bound to the Mo(IV) ion in a nearly planar fashion (Figure 4),15,16 and one deprotonated side-chain O, S or Se atom from serine, cysteine or selenocysteine at the apical position. The reaction mechanism of DMSOR with serine binding to Mo ion has been thoroughly studied.17-28 The product DMS is generated by a two-step reaction: 1) the substrate DMSO binds to Mo(IV) ion; 2) two electron transfer from the Mo(IV) ion to the substrate as the S−O bond is cleaved. All studies indicated that the second reaction is the rate-determining step with a barrier of 38−80 kJ/mol.18-23,28,29 Studies from our group have shown that calculated barrier strongly depends on the theoretical method and that a proper account of dispersion and solvation effects is needed, together with large basis sets and accurate density functional theory (DFT) methods.17,25

3

Figure 4. The active site of DMSO reductase, showing that the Mo ion coordinates to two MPT cofactors, DMSO and a Ser residue (PDB ID: 4DMR). However, how the reaction mechanism and rate change when DMSO reacts with the enzymes with different protein-derived ligands (serine, cysteine and selenocysteine) is still unclear. In paper V, small models were used to systematically investigate the reaction mechanism with various protein-derived ligands.

1.3 Formate Dehydrogenase Biological transformations of carbon dioxide (CO2) are key processes in global carbon cycle and have attracted great attention, because they may be used to combat the greenhouse effect. In nature, the reversible conversion of CO2 to formate (HCOO–) is catalysed by formate dehydrogenases (FDHs).15,30-33 The FDHs can be divided into two classes, metal-dependent and metal-independent. The metal-dependent FDHs can be further classified into molybdenum- and tungsten-FDHs. Here, we focus on Mo-FDHs. Similar to DMSOR, the active sites of FDHs in the oxidised (MoVI) state contain two molybdopterin cofactors (MPT, Figure 4), one protein-derived ligand (cysteine or selenocysteine) and a sulfido group, which coordinate to Mo ion in a distorted hexa-coordinated trigonal prismatic geometry.34-38 As the FDHs play important roles in the global carbon cycle, the reaction mechanism has been extensively studied by both experimental and computational methods.15,31,34,39 However, the conclusions from these studies did not coincide. As shown in Figure 5, five mechanisms have been suggested (see the detailed discussion in paper VI). The main issue is whether the substrate binds to Mo ion and whether the protein-derived ligand (cysteine or selenocysteine) dissociates from Mo during the reaction mechanism.

4

Figure 5. Suggested reaction mechanisms of FDHs. In paper VI, we examined all these reaction mechanisms and discussed these two questions. Finally, a reasonable mechanism was proposed.

5

2 Theory In this chapter, I will give a basic introduction to the theoretical methods used in the thesis, viz. QM-cluster, QM/MM, big-QM and the QM/MM thermodynamic cycle perturbation (QTCP) methods. For the QM-cluster approach, density functional theory (DFT), density matrix renormalisation group (DMRG), coupled cluster (CC) and short-range DFT (srDFT) calculations were employed in our papers. All these methods will be briefly introduced in order to give an overview of the underlying concepts. All equations are expressed in atomic units, i.e. ℏ (Planck constant) = me (mass of one electron) = e (charge of electron) = ke (Coulomb’s constant, 1/4πε0) =1.

2.1 Quantum Mechanics In quantum mechanical (QM) methods, a system is described by the Schrödinger equation, which in the time-independent form is Ψ = Ψ (1) where Ψ is the wave function, which is a function of all electron and nuclear positions in the system. Its squared absolute value, |Ψ| , corresponds to the probability distribution function for particles to be found in a specified volume element. is the Hamitonian operator, and E is the total energy of the system. The Hamiltonian is defined by = =−

+ 1 2

+ ∇ −

+ 1 2 +

+ 1

∇ − |



| − |

|

+

1 −

(2)

where A and B denote nuclei, whereas i and j denote electrons. n is the number is the mass and is the of electrons, whereas N is the number of nuclei. charge of the nucleus. ∇ (the Laplace operator) is the gradient with respect to coordinates of particle i. and are the positions of electron i and nucleus A, respectively. | − |, − and | − | are the distances between electron and nucleus, electron and electron, nucleus and nucleus, respectively. 6

The first and second terms of Eq. 2 are the kinetic energies of electrons and nuclei, respectively. The remaining three terms are potential energies, and they are, from left to right, the electron−nucleus attraction, the electron−electron repulsion and the nucleus−nucleus repulsion. Unfortunately, the Schrödinger equation can be analytically solved only for a few simple systems, e.g. systems with one nucleus and a single electron — for all large systems, only approximate numerical solutions can be obtained.

2.1.1 The Born−Oppenheimer Approximation Solving Schrödinger equation can be simplified with the Born−Oppenheimer approximation, which was introduced by Born and Oppenheimer in 1927.40 In this approximation, the key idea is the separation of electronic and nuclear motions. The mass of the nucleus is three to five orders of magnitude larger than the mass of an electron, resulting in a much faster movement of electrons compared to nuclei. Therefore, the movement of nuclei and the coupling between the nuclei and electronic motion can be neglected (the second term of Eq. 2), while the nuclear repulsion can be considered to be constant (the last term of Eq. 2). Thus, we obtain =−

1 2

∇ − +

| − |

|

+ |



1 −

(3)

which is the electronic Schrödinger equation. Note that the nuclei are only present as constant parameters (charges and positions).

2.1.2 Spin and the Pauli Exclusion Principle In quantum mechanics, spin is an intrinsic property of particles (without any classical analogue), and it behaves like an angular momentum. Particles with half-integer spins, such as , , , are known as fermions, and those with integer spins are known as bosons. Electrons are fermions with a half-integer spin (s = 1/2). Two spin states can be adopted for one electron: the state with ms = +1/2 is called | = ( ), and the state with ms = −1/2 is called | = ( ), where is the spin coordinates. The Pauli exclusion principle states that two or more identical fermions cannot occupy the same quantum state simultaneously. For a many-particle wave function the Pauli exclusion principle corresponds to that the wave

7

function is antisymmetric (for fermions) with respect to interchange of and . This principle imposes an additional coordinates any two fermions constraint on the wave function.

2.1.3 Slater Determinants One possible wave function ansatz for N-electron system problem is to construct the wave function as a Slater determinant Ψ

=

( ,

,

( ) ( )

1 √ !

(

) (

)

,……,

( ) ( ) ⋮

)

(

) (

( ) ( ) ⋮ ( ) ( )



)

⋱ ⋯

( ) ( ) (

) (

(4)

)

Thereby, the total N-electron wave function obeys the Pauli exclusion principle and displays the correct antisymmetry, because the determinant is zero if any two rows or columns are same and changes the sign upon exchanging two rows. In the determinant, is a one-electron wave function, also known as a spinorbital, which consist of a spatial orbital ( ), multiplied by a spin function =

( )∙

( ) ( )

The MOs are orthonormal,

(5) =

.

2.1.4 Hartree−Fock Method The Hartree−Fock (HF) method is one of the most common approximations to solve the Schrödinger equation.13,41-43 Here, a single-determinant in Eq. 4 is used in the Schrödinger equation. Thus, each electron moves in the average potential of all the other electrons, which means that N-electron problem is converted to a set of one-electron problems, i.e. the electron−electron repulsion is calculated in an average way. Here, I will focus on the restricted Hartree−Fock theory for closed-shell molecules, in which the spatial function ( ) in Eq. 5 is same for the and electrons in the same orbital. Thus, the many-particle electronic Hamiltonian is replaced by the Fock operator :

8

+

=

1 =− ∇ − 2

| −

|

+

(6)

/

=

(2 −

) (7)

In Eq. 6, the first and second terms of right hand side are the electron kinetic is the energy and the nucleus−electron attraction, respectively. Hartree−Fock potential, i.e. the mean field from all electrons besides electron . is the Coulomb operator, defining the electron−electron repulsion between is multiplied by 2 to account for the presence of two electron and . electrons in each orbital. is the exchange operator, which represents the energy associated with exchanging two electrons. and operate on a wave function , which describes the interaction of electron 1 (one-electron Coulomb and exchange operators), (1) (1) =

∗(

(1) (1) =

∗(

In Eq. 8, the quantity

2) 2)

∗(

2)

1 1

(2)

(1) (8)

(2)

(1) (9)

(2)

presents the potential energy of one

electron at due to the charge density at , where is the distance between and . Evaluation of the integral gives the total potential energy at due to the overall (or average) charge density produced by electron 2 in orbital . The that is solution of the Hartree−Fock equation produces a spin orbital determined by the average potential energy (or Coulomb field) of all the other (2) electrons. In Eq. 9, ∗ (2) represents the potential energy at due to the overlap charge distribution at associated with orbitals i and j. The integral gives the potential energy due to the total overlap charge density associated with electron 2. However, there is no classical analogue to the exchange energy and it is introduced as the result of the antisymmetry of wave function. The orbitals are eigenfunctions of the Fock operator, and the eigenvalue is the orbital energy. =

(10)

9

The total electron energy is not the sum of orbital energies, because the Fock operator contains terms describing the repulsion to all other electrons ( and ), and the sum over MO energies therefore counts the electron−electron repulsion twice.44 Instead, we can write =



1 2

(11)

2.1.5 Basis Sets To solve the set of equations in Eq. 10, we need a description of the orbitals ( ). For molecules, the molecular orbitals (MOs), , can be constructed by a set of atomic orbitals (AOs, ). We denote the AOs as “basis set”, which is known as the linear combination of AOs method, LCAO and usually refers to the set of (non-orthogonal) one-particle functions (which are known beforehand) to build MOs. Typically, basis functions are centred on the atoms. However, it should be noted that basis functions are usually not true atomic orbitals. In general, larger basis sets lead to a more accurate result, but also an increased computational cost. The employed AOs can either be Slater-type orbitals (STO)45 or Gaussiantype orbitals (GTO).46 The STOs are defined by ( , , )=

( , ) (12) where n, l, and m are the electron quantum numbers. is a normalisation constant, is the distance between the electron and atomic nuleus, is a constant, which is related to the effective charge of nucleus (the nuclear charge is partly shielded by electrons) and controls the width of orbitals (large gives is the spherical tight function and small gives diffuse functions). harmonics functions. The Gaussian-type orbitals are most often used, because they are more efficient in numerical calculations. They have the form =

( , )

(13)

STOs are more accurate, but the involved integrals are more complicated. To mimic the STOs, a strategy is to use a linear combination of several GTOs, called contracted GTOs (CGTO). The STO-3G is a well-known minimal basis set, which contracts three GTOs to mimic the STOs. CGTO might give a good approximation for an atomic orbital, but it lacks the flexibility to expand or shrink in the presence of other atoms in a molecule.

10

Therefore, they cannot give highly accurate results. To improve this approximation, more than one CGTO is often used for each electron. For example, doubling the number of GTO per electron is termed a double- basis set. In practice, we often use a split-valence basis set, i.e. one CGTO is used for core electrons, but two for the valence electrons. Furthermore, so-called polarisation and diffuse functions can be added to improve the accuracy. In general, the polarisation functions are represented by GTOs of angular momentum +1. Diffuse functions are required to use for the description of anions and the systems with electron distributions that extend further from the nuclei. Diffuse functions have small exponents to hold the electrons far from the nucleus. In this thesis, Karlsruhe,47 Pople,48 and atomic natural orbital with relativistic and core correlation (ANO-RCC)-type49 basis sets are used. Among the Karlsruhe basis sets, def2-SV(P) is a valence double-zeta basis set with polarisation functions on heavy atoms; def2-TZVP is valence triple-zeta polarisation basis set; and def2-QZVPD is valence quadruple-zeta polarisation basis set with diffuse functions. The Pople-type basis sets are labelled as X-YZG (e.g. 6-31G), where X is the number of GTOs linearly combined to construct CGTOs for the core electrons, Y and Z indicate that the valence electrons are described by two basis function (CGTOs), i.e. a valence double-zeta basis set (likewise, X-YZWG means a triple-zeta basis set). For ANO-RCC basis set, ANO-RCC-VZDP, ANO-RCC-VZTP, and ANO-RCC-VZQP are valence double-, triple- and quadruple-zeta basis sets, respectively.

2.1.6 Post-HF Methods The HF method adopts a mean-field approximation, i.e. the electron−electron repulsion is not described rigorously. However, the accuracy of calculation with this method is not good enough for many cases. Therefore, several methods use the HF wave function as the starting point and then add a more elaborate description of electron−electron interactions (electron correlation). 2.1.6.1 Møller−Plesset Perturbation Theory Møller−Plesset perturbation theory is a widely used method in quantum chemistry calculations.50 Adopting perturbation theory, we can write the Hamiltonian as =

+

(14)

as the Fock operator and the In Møller−Plesset perturbation theory, we set perturbation to , and is the perturbation strength. In HF theory, the sum of Fock operators counts the electron−electron repulsion twice, thus the 11

operator minus twice the 〈 〉 operator, i.e. perturbation becomes the exact − 2〈 〉. = The zero-order equation is simply the Schrödinger equation for the unperturbed system and the corresponding energy is ( )

= 〈Ψ ( ) |

|Ψ ( ) 〉 ≡ (MP0) (15)

This is just a sum of MO energies. The first-order correction to the energy is ( )

= 〈Ψ ( ) | |Ψ ( ) 〉 ≡ (MP1) (16)

This correction is for the overcounting of the electron−electron repulsion at zeroth order. Thus, the sum of E(MP0) and E(MP1) is the HF energy. The second-order (MP2) energy, which is most used approximation, can be expressed as ( )

=

|〈 Ψ (

)

( )

Ψ 〉| −

(17)

2.1.6.2 Configurational Interaction Methods Another post-HF method is configurational interaction (CI).51,52 In order to account for electron correlation, a variational wave function is constructed by linear combination of configuration state function (CSFs) built from spin orbitals, Ψ=

Ψ =

Ψ +

Ψ +

Ψ + ⋯ (18)

The first term Ψ is normally the HF determinant and the other CSFs represent the determinants with some electrons excited to virtual orbitals: If only oneelectron excitations are included (swapping one occupied spin orbital with virtual orbital in the determinant), it is called CIS; and if only two-electron excitation is allowed, it is called CID. The most often-used method is CISD which is limited to single and double excitations. The method with all possible determinants considered (with a certain basis set) is called full CI (FCI), which represents the exact solution within that basis set. In quantum chemistry calculation, the energy is minimised by varying the coefficients in Eq. 18.

12

2.1.6.3 Coupled Cluster Methods The coupled cluster (CC) method is an accurate method, but with very high computational cost.53 In particular, the singles and doubles with perturbatively treated triples (CCSD(T)) has become the current gold-standard of quantum chemistry. The wave function of the CC theory is written in terms of exponential functions |Ψ =

|Ψ (19)

where Ψ is a Slater determinant usually constructed from the HF wave function. is an excitation operator that is a linear combination of excited Slater determinants. The is expressed as =

+

+

+ ⋯ (20)

where is the operator of all single excitations, is the operator of all double excitations, and so forth. The exponential operator can be written in form of Tayor series = 1 + +

2!

+⋯=1+

+

+

2

+

+

2

+ ⋯ (21)

In practice, the expansion of into individual excitation operators is usually terminated at the second level of excitation. Coupled cluster methods usually recover more correlation energy than CI methods with the same maximum excitations, due to the non-linear nature of exponential function. 2.1.6.4 Complete Active Space Methods All the mentioned post-HF methods above assume that the HF Slater determinant is a qualitatively correct reference wave function and thus the correlation is small. However, for e.g. near-degenerate ground states and bondbreaking reactions, the HF approximation becomes problematic. These cases are said to introduce static correlation. Static correlation means in other words that more than one determinant has a large weight in the total wave function. The post-HF methods described up to here, e.g. CCSD(T), treat the dynamic correlation (which is described by a (large) number of determinants with small weight) well. However, these methods usually fail for static correlation. The multi-configurational self-consistent field (MCSCF) methods can be considered as a CI where not only the coefficients of Eq. 18 are optimised, but also the MOs used for constructing the determinants are optimised.54 Thus, the Ψ in Eq. … |, where 18 are now considered as Ψ = |

13

=

Φ (22)

are the MO coefficients. Both the CI and MO coefficients, and , The are optimised in MCSCF. With the selected configurations, the MCSCF can generate a qualitatively correct wave function, i.e. recovering the static part of the correlation. One of most commonly used MCSCF methods is called the complete active space self-consistent field (CASSCF).55 The selection of configurations is done by partitioning the MOs into active and inactive spaces. In the active space, a full CI is performed, i.e. all possible excitations are considered. Using the reference wave function from CASSCF calculation, complete active space perturbation theory (CASPT2) can be employed to obtain the dynamic correlation in a similar manner as MP2 is applied on the HF wave function.56 In CASSCF calculation, only a small number of orbitals and electrons can be handled (currently up to 18 electrons in 18 orbitals) in the active space. In 1992, the density matrix renormalisation group (DMRG) theory was introduced.57 This method can solve the CASSCF calculations with a significant larger active space (up to about 50 electrons in 50 orbitals). DMRG arranges the orbitals (“sites”) in the CAS linearly. Each orbital has a physical degree of freedom | . We can write the wave function as |Ψ =



|



(23)



Similar to Eq. 18, but the coefficients in Eq. 23 are written as a coefficient tensor. The tensor can be rewritten in matrix product states (MPSs) … using a series of singular-value decompositions (SVD) as |Ψ =

,

,



,

|



(24)



The DMRG algorithm optimises the site matrices iteratively. , is matrix for each site (rank-3 tensor), where D is also the dimension of the MPS (virtual dimension). The matrix D determines how accurate the DMRG approximates the full CAS. Values of 1000−2000 are usually sufficient but this is can be system-dependent.

14

2.1.7 Density Functional Theory The original idea of density functional theory (DFT) is to characterise systems by the electron density ( ), which has only 3 variables, viz. the three Cartesian coordinates x, y, and z. The computational effort to solve Schrödinger equation is thus significantly reduced, compared to the wave function methods. However, if all the energy components are expressed as a functional of the electron density, it turns out the DFT methods give poor results.44 Nowadays, DFT is based on Kohn-Sham theory, in which the electron kinetic energy is calculated from an auxiliary set of orbitals used to represent the electron density, meaning that a dependence on orbitals is introduced. 2.1.7.1 Kohn−Sham DFT Kohn and Sham formulated a variant DFT that use the framework of the Hartree−Fock method,58 and the Kohn-Sham electronic energy is expressed as ( )=

( )+

( )+

( )+

( ) (25)

is the kinetic energy of the non-interacting electrons, is the where is the electron repulsion, and the last term is electron−nucleus attraction, known as exchange−correlation energy that will be discussed in next section. The one-electron Kohn−Sham operator is =−

1 2

∇ −

| −

|

+

( ) | − |

′+

(26)

The Kohn−Sham electron density ( ) is obtained from the Slater determinant, ( )=

| ( )| (27)

and the exchange−correlation potential, =

, is defined as

(28)

The eigenvalue equation we solve in the Kohn−Sham method is =

(29)

15

where is the corresponding orbital energy. In DFT methods, the exchange term in HF theory disappears and the ) is considered by the term. exchange−correlation energy ( 2.1.7.2 Exchange−Correlation Functionals The only unknown part of Eq. 25 is the exchange−correlation (XC) functional, which can be split into exchange and correlation parts. There are many ( ), approximations to the exchange and correlation potentials ( ) and respectively.59 A basic approximation is the so-called local-density approximation (LDA), which assumes that the density is slowly varying and the inhomogeneous density of a molecular can be approximated using the homogeneous electron gas. The exchange−correlation energy is ( )

=

( )

=

( )( ( ) +

( ))

(30)

( ) is the exchange−correlation energy density (the where exchange−correlation energy per particle of a uniform electron gas with density ( ) is a functional of the density only. The corresponding ). In LDA, the exchange−correlation potential is =

( )

( ) + ( )

=

( )

(31)

For the homogeneous electron gas, the exchange-energy functional can be expressed exactly as =−

3 3 4

(32)

This is no explicit expression for the correlation part, but it can be obtained from quantum Monte Carlo simulations. Another approximation for the exchange−correlation energy is based on generalised gradient approximation (GGA), according to which the exchange−correlation functional depends on both and ∇ (i.e. the first derivative of , also called the gradient charge density). One commonly used GGA functional is that of Becke, =−

(1 + 6



)

(33)

16

where

=

∇ /

, and =0.0042, which is determined based on the best fit to the

energies of six noble gas atoms using the sum of the LDA and GGA exchange terms. At the next level of approximation, the meta-GGAs improve the accuracy by employing also the Laplacian (second derivative) of . In practice, they usually include the kinetic energy density (τ) instead of the Laplacian because it is numerically more stable. 1 |∇χ (r)| (34) 2

τ(r) =

Finally, the hybrid functionals add a fraction of HF exchange to They have the general form = (1 − )

+

( ).

(35)

One of the most used hybrid function is B3LYP, which can be expressed as = (1 − ) +

+ + ∆ + (1 − ) (36)

where a = 0.20, b = 0.72 and c = 0.81.

2.1.8 Multiconfigurational Short-Range DFT Method In this section, a hybrid method, combining wave function theory (WFT) and density functional theory (DFT), will be introduced. This method relies on the rang-separation of the two-electron repulsion operator into long-range and short-range parts60 (1,2) =

(1,2) +

(1,2) (37)

Several forms of range-separated operators have been developed. Here, we discuss one type range-separated operator that was used in our calculations, which is based on the error function (erf)61,62 (1,2) =

erf( | − |) ; | − |

(1,2) =

17

1 − erf ( | | −

− |

|)

(38)

where is the range-separation parameter. In the limiting case, a value of = ∞ implies that the DFT (short-range) part will vanish, giving a pure wave function method; whereas = 0 results in a pure Kohn−Sham DFT method. The effective electronic Hamiltonian used in multiconfigurational short-range DFT (MC-srDFT) is =−

1 2

∇ −

| −

|



erf

+



+

(39)

where is short-range adapted and -dependent exchange−correlation (XC) potential, obtained from the DFT theory. It should be stressed that the special exchange−correlation functionals are a prerequisite for range-separated method. In our calculations, the short-range PBE-based srPBE functional was used.63,64 For the long-range part, the wave function was described by CASSCF, so-called CAS-srPBE.

2.2 Molecular Mechanics Methods QM methods, which solve the Schrödinger equation, can at most handle ~1000 atoms, e.g. at TPSS/def2-SV(P) level of theory. However, proteins typically contain tens of thousands of atoms, so for such systems the computational cost of a pure QM calculation would be too high. With the molecular mechanics (MM) method, proteins and other large systems can be simulated, because no attempt is made to solve the Schrödinger equation and electrons are ignored. With MM, molecules are described as a collection of balls, connected by springs, and the system is described by an empirical function, a force field. For proteins, the potential energy typically contains terms for the distortion of bonds, angles, and dihedrals (torsions), as well as the nonbonded exchangerepulsion, dispersion (van der Waals interaction), and electrostatic interaction energies. =

+

+

+

+

(40)

Stretching a covalent bond is normally assumed to be harmonic. Thus, the bond energies can be expressed as =

( −

) (41)

18

is the bond force constant and is reference bond length. Likewise, where , is also approximately described by harmonic potential, the angle term, =

( −

) (42)

is the angle force constant, and is the reference angle. The dihedral where , is associated with the rotation around a bond, for which a periodic term, function is used: =

1 2

1 + cos (

+

(43)

is the force constant, is the periodicity, is the phase of the where term uses a Lennard-Jones torsion, and is the torsional angle. The potential, which divides the interaction into a short-range repulsive term and long-range attractive term, =



(44)

and are coefficients that depend on the atom types. The term where describe the short-range interactions, i.e. exchange-repulsion due to the overlap of atoms. The term is the long-range interaction, i.e. the dispersion or van der Waals force. The last term, is the Coulomb potential =

4

(45)

where and are the charges of the atoms, is the permittivity of vacuum, is relative permittivity (also known as the dielectric constant) of the medium, and is the distance between the atoms. With MM method, the total energy of a protein can be calculated in seconds. Therefore, molecular dynamics (MD, based on Newton’s second law of motion) or Monte Carlo (MC, based on random sampling) can be run to study thermodynamic ensembles of structures. In addition, the protein is typically solvated with several thousands of explicit water molecules. Several water models can be used in the simulation, e.g. SPC (simple point charge), TIP3P (transferable intermolecular potential with three points), TIP4P, etc. The TIP3P model was used in the thesis,65 in is determined by intermolecular interaction which the potential energy, between the sites, described by Coulomb and Lenard-Jones potentials,

19





=

+

4



(46)

where, is the charge of hydrogen or oxygen atom, and = −2 . A Lenard-Jones potential is used to describe the oxygen−oxygen interaction between two water molecules. In TIP3P, rOH = 0.9572 Å, ∠HOH = 104.52°, A = 5.82 × 105 kcal Å12/mol, C = 595.0 kcal Å6/mol, qO = –0.834, and qH = 0.417. The interactions between water and protein are described by the normal MM nonbonded terms (the fourth and fifth terms in Eq. 40). In addition, counter ions are often added to produce a neutral system. To mimic an infinite system, periodic boundary conditions are employed and long-range electrostatic interactions are often treated by Ewald summation.

2.3 QM/MM Method As discussed above, the QM methods that calculate molecular electronic structure can give good description of the reaction for small systems, whereas the MM method can handle a large system but does not give a good description of the chemical reactivity because electrons are neglected, and e.g. Eq. 41 shows that the bonds cannot be cleaved. The QM/MM approach takes advantage of the accuracy of QM methods and the speed of MM method:66 For a small but interesting part where e.g. a chemical reaction occurs, QM calculation is used, whereas the remainder of the protein and the surrounding solvent are described by MM method.67 Therefore, the QM/MM methods are widely used for enzymatic reactions. The total energy of the combined QM/MM methods can be expressed as =

+

+

(47)

where is the energy of the QM region, is the energy of the MM region calculated by an MM force field, and is the interaction between typically consists of three types of the QM and MM parts. The interactions: electrostatic interactions ( ), van der Waals interactions ( ), and MM-bonded interactions ( ). =

+

+

(48)

where is not calculated separately, but is included in the the electrostatic embedding (EE) approach. The Hamiltonian is

20

term in

=

+

(49)

is the Hamiltonian for the QM region, and is the where Hamiltonian for the electrostatic interaction between QM system (electrons and , which nuclei) and MM system (modelled as point changes). represents the van der Waals interaction (the dispersion interaction) and other short-range repulsive interactions, is normally described by a Lennard-Jones potential (Eq. 44). The last term is employed only when there are chemical bonds between the QM and MM regions, and it is calculated by the same force field as for . In this thesis, the hydrogen-link (HL) atom approach is used for the QM/MM boundary. In our calculations, the total simulated system is divided into three subsystems: system 1 is the QM region that contains the active site of enzyme, and systems 2 and 3 are MM regions. System 2 typically consists of all residues within ~6 Å of system 1, and may be relaxed by MM during the geometry optimisation. System 3 is the remaining part and is kept fixed. In our QM/MM approach, the energy is obtained from Eq. 50, QM/MM

=

HL QM1+ptch23

+

CL MM123,q1 =0



HL MM1,q1 =0 (50)

is the QM energy of the QM region truncated by HL atoms where is and embedded in a set of point charges modelling systems 2 and 3. , the MM energy of the QM system, still truncated by HL atoms, but without any is the classical energy of all electrostatic interactions. Finally, , atoms in the system with carbon link (CL) atoms and with the charges of the QM system set to zero (to avoid double counting of the electrostatic interactions).

2.4 Big-QM Approach A of the problems with both the QM-cluster and QM/MM methods is that the energies vary strongly with the size of the QM region are selected. The big-QM approach was developed to obtain converged results.68 In the big-QM calculations, all the important residues are included:68 All chemical groups within 4.5−6 Å of the minimal QM system, all buried charged groups in the protein, and two capped amino acids round each residue in the minimal QM system are included in the QM calculations. This typically gives a QM system of 600−1200 atoms. All the big-QM calculations in the thesis were performed

21

on the coordinates from the QM/MM optimisation and with a point-charge model of surrounding because this gave the fastest calculations.68

2.5 QTCP Method A serious issue with QM/MM methods comes from local-minima problem, which is caused by the fact that a minimisation typically converges to the closest local minimum. There are no methods that always find the global minimum. In practice, it is not necessary that all groups are in their global minimum, but it is essential that they remain in the same local minimum throughout a reaction mechanism. A proper way to solve the problem is to calculate free energies, which involves sampling and averaging over all relevant thermally accessible structures.

Figure 6. The thermodynamic cycle used in QTCP In this thesis, the QM/MM thermodynamic cycle perturbation (QTCP) approach was used to calculate the free energy difference between two states at the QM/MM level of theory, using sampling only at the MM level.69-71 The QTCP method employs the thermodynamic cycle in Figure 6. The top arrow is the QM/MM free energy from state A to state B, but it is computationally too expensive to calculate. An alternative way to obtain the QM/MM free energy is based on Eq. 51. ∆

QTCP (A→B)

= ∆ MM (A→B) + ∆ MM→QM/MM (B) − ∆ MM→QM/MM (A) (51)

The first term, corresponding to the lower horizontal line in Figure 1, is the perturbation from state A to state B at the MM level. This energy can be calculated by free perturbation (FEP) according to

22



MM (A→B)

=−



ln 〈exp −

〉 (52)

is the Boltzmann constant, is the temperature, and the angular where brackets indicate an average over a MD ensemble, sampled for state A. One should test the precision of the result by calculating the same quantity while switching the A and B states. The second and third terms are the free energy perturbation from MM to QM/MM description. The energy can be expressed as ∆

MM→QM/MM (

)=−

ln 〈exp −

QM1+ptch23 (

)



was defined in Eq. 50.

where the X is either A or B.

23

(53)

3 Summary of the Articles In this thesis, the six papers are grouped into two themes: Theoretical studies of [NiFe] hydrogenase (Papers I−IV) and of Mo-containing enzymes (DMSO reductase and formate dehydrogenase, Papers V and VI).

3.1 Paper I The [NiFe] hydrogenases catalyse the reversible conversion of protons and electrons to H2. However, the detailed reaction mechanism is unclear. The aim of paper I was to decide which of the Cys residues in the active site is most favourable to protonate. Therefore, the protonation states of the four Cys residues in four putative states in reaction mechanism, viz. the Ni-SIa, Ni-R, NiC and Ni-L states, were studied in paper I. In order to study the protonation states of the four Cys residues, a set of advanced methods were used: Geometries were optimised by the standard QM/MM approach with a small QM system; Accurate energies were calculated using the big-QM method (with 817 atoms in QM region), including all chemical groups within 4.5 Å of a minimal model of active site, all buried charged group, and moving the junctions at least two residues away from the active site. These calculations were performed at the TPSS/def2-TZVP level and energies were extrapolated to the B3LYP/def2-QZVPD level. To avoid the local-minima problem, QTCP calculations were also performed. With these methods, we compared the energies of the species with different protonation states of the four Cys resides in the four putative intermediates in the reaction mechanism (Ni-SIa, Ni-R, Ni-C and Ni-L) in reaction mechanism. Our results show that protonation of Cys-546 is most favourable for all four states, by 14−51 kJ/mol. For the Ni-R state, our results are consistent with a recent atomic-resolution crystal structure.4

24

3.2 Paper II In paper II, we studied H2 binding to the active site of [NiFe] hydrogenase. According to previous studies, both experimental and theoretical, two different binding modes have been suggested, viz. binding to Ni or binding to Fe. Moreover, the ground state of the Ni ion is not clear when H2 binds to the active site. From previous theoretical studies, we know that the energies of this specific system are sensitive to the DFT methods and the size of the QM region. Thus, the results were different in different DFT calculations. In order to solve the problems with different DFT methods and QM sizes, we employed CCSD(T), DMRG-CASPT2, as well as big-QM methods. The former two methods were used to calibrate the DFT methods, and the big-QM was used to avoid the QM-sizes problem. In the CCSD(T) calculations, a minimal model (18 atoms) was used because this method is computationally expensive. In the DMRG calculations, three models were used, with the active spaces of 22 electrons in 22 orbitals for the singlet state and 24 electrons in 24 orbitals for the triplet state (CAS(22,22) and CAS(24,24)). For the big-QM calculations, 819 atoms were included, and this size of the QM system is much larger than in any of the previous theoretical studies. All the geometry optimisations were performed by QM/MM method with the TPSS and B3LYP functionals, which allows us to evaluate which functional gives consistent results with CCSD(T) and DMRG. Our results show that H2 prefers to bind to the Ni ion in the singlet state, rather than to Fe, by at least 47 kJ/mol. For the triplet state, only H2 binding to Fe species was found. In addition, we found that for this case, the TPSS functional gave better energies than B3LYP.

25

3.3 Paper III In this paper, we have employed a new combined multiconfigurational and DFT method, CAS-srDFT, to explore the H2 binding to the active site of the [NiFe] hydrogenase. The advantage of this method is that it captures dynamic correlation from DFT calculations, and the static correlation from the CASSCF wave functions. We know from paper II that H2 prefers to bind to the Ni ion, and we therefore examined whether the CAS-srDFT calculations give the same conclusion or not. Since the dynamic correlation is calculated by DFT methods, the better exchange−correlation functional is used, the more accurate the results will be. In this paper, we used the short-range PBE-based srPBE function by Goll et al. On the other hand, the range-separated framework is varied by the parameter . First, we studied how the energies changed when using different values. =0.4 gave the best results, which is in agreement with the previous studies.61 The CASSCF method, used for the long-range part, is very sensitive to the orbitals in active space. Therefore, one must ensure that the orbitals are comparable in different calculations. We employed three different sizes of active space, viz. CAS(12,12), CAS(14,14), and CAS(16,16). In addition, three models were used to compare with paper II. The CAS-srPBE results show that H2 binding to Ni is more favourable than binding to Fe, which is consistent with the conclusion in paper II. For all three models, the effect of extending the active space from CAS(10,10) to CAS(14,14) was found to be small, ~2 kJ/mol. For model 1, we further employed CAS(16,16), which gave rise to a change of only 0.2 kJ/mol. Thus, the energies seem to converged with CAS(14,14). This is much smaller than in the previous DMRG calculations, which employed an active space of CAS(22,22), showing that the computational cost of CAS-srDFT is much lower, because a smaller active space can be used and the DFT calculation is much cheaper than the CASPT2 calculations.

26

3.4 Paper IV Based on our results from paper I and II, we know that the Cys-546 is most easily protonated and that H2 prefers to bind to Ni ion. However, the details of the reaction mechanism are still an open question, in particular whether the NiL state is involved in the reaction mechanism. In this paper, we have studied the full reaction mechanism of [NiFe] hydrogenase. In this study, the QM/MM optimisations were carried out to obtain the geometries. More accurate energies were obtained by the big-QM calculations with 819 atoms. Moreover, DMRG-CASSCF calculations were carried out to study the electronic structures of the various states in the reaction mechanism. Our calculations show that the Ni-L state is not involved in the reaction mechanism. Instead the Ni-C state is reduced by one electron and then the bridging hydride ion is transferred to the Cys-546 as a proton and the two electrons transfer to Ni ion. The cleavage of H−H bond is facile with an energy barrier of 23 kJ/mol based on our calculations. We also find that the reaction energies are sensitive to the size of QM system and the basis set, in agreement with our previous studies.

27

3.5 Paper V In this paper, we have studied the effect of variations of the protein ligand in DMSO reductase. The DMSO reductase family is the largest and most diverse family of the mononuclear molybdenum oxygen-atom-transfer proteins. In the reaction cycle, the oxidation state of the Mo ion cycles between +IV and +VI. Remarkably, various members of the DMSO reductase family may employ three different protein-derived ligands (serine, cysteine, or selenocysteine). We have studied how the DMSO reductase reaction mechanism changes with alternative models of active site, varying the protein-derived ligand. According to previous theoretical studies, the calculated barrier depends on the theoretical method and a proper account of dispersion and solvation effects is needed, together with large basis sets and accurate density functional theory (DFT) methods. In this paper, geometries were optimised in gas phase at the TPSS/def2-SV(P) level without any symmetry constrains. The energies were improved by single-point calculations using the B3LYP functional combined with the def2-TZVPD basis set. DFT-D3 dispersion corrections were applied to all single-point calculations and solvent effects were considered by the COSMO continuum solvent model with a dielectric constant of 4 to mimic the protein surrounding. Our results show that the same mechanism was obtained with Ser, Cys, SeCys, OH– and SH– models for protein-derived ligands: The DMSO substrate first binds to Mo(IV) ion and then the S−O bond of DMSO is cleaved to generate product DMS. All five models gave similar activation barriers of 69−85 kJ/mol. However, with the doubly charged O2– and S2– models, the activation barriers were much higher, 212 and 168 kJ/mol, indicating that it is likely that the oxo and sulfido ligand are protonated to OH– and SH– during the reaction of enzymes employing these ligands.

28

3.6 Paper VI Formate dehydrogenases (FDHs) catalyse the reversible conversion of formate to carbon dioxide. In paper VI, we have studied the reaction mechanism of Mocontaining formate dehydrogenase. In the previous experimental and theoretical studies, five putative mechanisms have been suggested. For these mechanisms, there are two important controversial questions: a) Does the cysteine in the active site dissociate from Mo during the reaction? and b) Does the substrate formate bind directly to Mo ion or not? In the previous theoretical studies, small models were used and the protein surroundings were ignored. In this paper, the geometries were optimised with the QM/MM methods and the protein environment was considered by the MM method. Based on the QM/MM-optimised structures, we run big-QM calculations with 1121 atoms to obtain reliable energies. Moreover, thermal corrections from vibrational frequency calculations were added to the final energies. Our results indicate that formate substrate does not bind directly to the Mo ion, but instead resides in the second coordination sphere. There, the sulfido group abstracts the hydride of formate, resulting in a Mo(IV)−SH state. Initially, the CO2 product forms a thiocarbonate group with the Cys ligand. This step is quite favourable with an activation energy of 28 kJ/mol and a reaction energy of –39 kJ/mol. However, the CO2 product is not released until the active site is oxidised by two electrons.

29

4 Conclusions and Outlook In this thesis, we have studied three metalloenzymes, viz. [NiFe] hydrogenase, dimethyl sulfoxide reductase, and formate dehydrogenase. For [NiFe] hydrogenase, we have studied the protonation states of the four cysteine residues in active site, the H2 binding site and the full reaction mechanism. This has allowed us to draw a clear picture of reaction mechanism. However, there still exist some unclear points. First, in the crystal structure, the glutamate acid, which forms a hydrogen bond to the terminal protonated cysteine in active site, is deprotonated, whereas in our calculations, the proton always moves to the glutamate acid from the cysteine. It would be interesting to study the crystal structure with theoretical methods, in order to calibrate our methods. Second, in our QM/MM studies, the big-QM approach was used to obtain stable QM energies. However, this approach is expensive and it introduces the problem to decide which residues have large effects on energy. Finally, why is the crystal structure of hydrogenase more similar to that of the triplet state, although our calculations indicate that this state is strongly unfavourable? We currently investigate these questions in our group. For the Mo-containing enzymes, dimethyl sulfoxide reductase and formate dehydrogenase, we have studied the effect of variations in the protein-derived ligand for the former enzyme and the full reaction mechanism for the latter. We will continue to investigate enzymes that involve a Mo ion, e.g. nitrate reductases. The latter enzyme has an active site with the same Mo coordination sphere as formate dehydrogenase. However, it is still an open question whether the two enzymes follow a similar reaction mechanism. We currently study this enzyme in our group. In paper III, we have studied the H2 binding in the active site of [NiFe] hydrogenase with CAS-srDFT method. But, we did not consider the triplet state because open-shell calculations were not available at that time. However, now such code has been implemented in the DALTON package, which allows us to evaluate whether the CAS-srDFT method can handles the energy different between the singlet and triplet states for this enzyme. As DMRG can handle a large active space (~50 elections in 50 orbitals), it can be used to calibrate DFT methods in some specific cases, e.g. when two or more metals are involved systems. It is also possible to study chemical reactions, e.g. bond cleavage, if the necessary orbitals are included in the active space. Currently, we investigate the lytic polysaccharide monooxygenases with CASSCF/CASPT2. However, an active space of CAS(16,16) or CAS(18,18) might be too small to produce comparable energies for reactant, transition and product states. Therefore, DMRG is a possible solution for this case.

30

Reference 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A. Volbeda, M. H. Charon, C. Piras, E. C. Hatchikian, M. Frey and J. C. Fontecillacamps, Nature, 1995, 373, 580. Y. Higuchi, H. Ogata, K. Miki, N. Yasuoka and T. Yagi, Structure, 1999, 7, 549. W. Lubitz, H. Ogata, O. Ruediger and E. Reijerse, Chem. Rev., 2014, 114, 4081. H. Ogata, K. Nishikawa and W. Lubitz, Nature, 2015, 520, 571. H. Ogata, T. Kramer, H. Wang, D. Schilter, V. Pelmenschikov, M. van Gastel, F. Neese, T. B. Rauchfuss, L. B. Gee, A. D. Scott, Y. Yoda, Y. Tanaka, W. Lubitz and S. P. Cramer, Nat. Commun., 2015, 6, 7890. R. Hidalgo, P. A. Ash, A. J. Healy and K. A. Vincent, Angew. Chem. Int. Ed., 2015, 54, 7110. B. J. Murphy, R. Hidalgo, M. M. Roessler, R. M. Evans, P. A. Ash, W. K. Myers, K. A. Vincent and F. A. Armstrong, J. Am. Chem. Soc., 2015, 137, 8484. S. Dementin, B. Burlat, A. L. De Lacey, A. Pardo, G. Adryanczyk-Perrier, B. Guigliarelli, V. M. Fernandez and M. Rousset, J. Biol. Chem., 2004, 279, 10508. H. Ogata, Y. Mizoguchi, N. Mizuno, K. Miki, S. Adachi, N. Yasuoka, T. Yagi, O. Yamauchi, S. Hirota and Y. Higuchi, J. Am. Chem. Soc., 2002, 124, 11628. A. Volbeda and J. C. Fontecilla-Camps, Dalton Trans., 2003, 4030. Y. Montet, P. Amara, A. Volbeda, X. Vernede, E. C. Hatchikian, M. J. Field, M. Frey and J. C. FontecillaCamps, Nat. Struct. Biol., 1997, 4, 523. Metal Dihydrogen and σ-Bond Complexes: Structure, Theory and Reactivity; Kubas, G. J., Ed.; Kluwer Academic/Plenum Publishers: Dordrecht, Netherlands, 2001. S. Qiu, L. M. Azofra, D. R. MacFarlane and C. Sun, Phys. Chem. Chem. Phys., 2018. M. Bruschi, M. Tiberti, A. Guerra and L. De Gioia, J. Am. Chem. Soc., 2014, 136, 1803. R. Hille, J. Hall and P. Basu, Chem. Rev., 2014, 114, 3963. R. Hille, Chem. Rev., 1996, 96, 2757. J. L. Li, R. A. Mata and U. Ryde, J. Chem. Theory Comput., 2013, 9, 1799. A. L. Tenderholt, J. J. Wang, R. K. Szilagyi, R. H. Holm, K. O. Hodgson, B. Hedman and E. I. Solomon, J. Am. Chem. Soc., 2010, 132, 8359. E. Hernandez-Marin and T. Ziegler, Can. J. Chem., 2010, 88, 683. J. P. McNamara, I. H. Hillier, T. S. Bhachu and C. D. Garner, Dalton Trans., 2005, 3572. 31

21 J. P. McNamara, J. A. Joule, I. H. Hillier and C. D. Garner, Chem. Commun., 2005, 2, 177. 22 A. Thapper, R. J. Deeth and E. Nordlander, Inorg. Chem., 2002, 41, 6695. 23 C. E. Webster and M. B. Hall, J. Am. Chem. Soc., 2001, 123, 5820. 24 A. Thapper, R. J. Deeth and E. Nordlander, Inorg. Chem., 1999, 38, 1015. 25 J. L. Li, M. Andrejic, R. A. Mata and U. Ryde, Eur. J. Inorg. Chem., 2015, 3580. 26 J. L. Li and U. Ryde, Inorg. Chem., 2014, 53, 11913. 27 S. Metz and W. Thiel, Coord. Chem. Rev., 2011, 255, 1085. 28 M. Hofmann, Inorg. Chem., 2008, 47, 5546. 29 A. L. Tenderholt, K. O. Hodgson, B. Hedman, R. H. Holm and E. I. Solomon, Inorg. Chem., 2012, 51, 3436. 30 A. M. Appel, J. E. Bercaw, A. B. Bocarsly, H. Dobbek, D. L. DuBois, M. Dupuis, J. G. Ferry, E. Fujita, R. Hille, P. J. A. Kenis, C. A. Kerfeld, R. H. Morris, C. H. F. Peden, A. R. Portis, S. W. Ragsdale, T. B. Rauchfuss, J. N. H. Reek, L. C. Seefeldt, R. K. Thauer and G. L. Waldrop, Chem. Rev., 2013, 113, 6621. 31 L. B. Maia, J. J. G. Moura and I. Moura, J. Biol. Inorg. Chem., 2015, 20, 287. 32 T. Hartmann, N. Schwanhold and S. Leimkuhler, Bba-Proteins Proteom, 2015, 1854, 1090. 33 M. Jormakka, B. Byrne and S. Iwata, Curr. Opin. Struct. Biol., 2003, 13, 418. 34 P. Schrapers, T. Hartmann, R. Kositzki, H. Dau, S. Reschke, C. Schulzke, S. Leimkuhler and M. Haumann, Inorg. Chem., 2015, 54, 3260. 35 H. C. A. Raaijmakers and M. J. Romao, J. Biol. Inorg. Chem., 2006, 11, 849. 36 H. Raaijmakers, S. Macieira, J. M. Dias, S. Teixeira, S. Bursakov, R. Huber, J. J. G. Moura, I. Moura and M. J. Romao, Structure, 2002, 10, 1261. 37 J. C. Boyington, V. N. Gladyshev, S. V. Khangulov, T. C. Stadtman and P. D. Sun, Science, 1997, 275, 1305. 38 M. Jormakka, S. Tornroth, B. Byrne and S. Iwata, Science, 2002, 295, 1863. 39 L. B. Maia, I. Moura and J. J. G. Moura, Inorg. Chim. Acta 2017, 455, 350. 40 M. Born and R. Oppenheimer, Ann. Phys., 1927, 84, 457. 41 D. R. Hartree, F. R. S. Hartree and W. Hartree, Proc. R. Soc. Lond. A Math. Phys. Sci., 1935, 150, 0009. 42 V. Fock, Z. Angew. Phys., 1930, 61, 126. 43 D. R. Hartree, P. Camb. Philos. Soc., 1928, 24, 89. 44 F. Jensen, Introduction to Computational Chemistry; Wiley: New York, 2013. 45 J. C. Slater, Phys. Rev., 1930, 36, 0057.

32

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

S. F. Boys, Proc. R. Soc. Lon. Ser. A, 1950, 200, 542. F. Weigend and R. Ahlrichs, Phys. Chem. Chem. Phys., 2005, 7, 3297. R. Ditchfield, W. J. Hehre and J. A. Pople, J. Chem. Phys., 1971, 54, 724. P. O. Widmark, P.-Å. Malmqvist and B. O. Roos, Theor. Chim. Acta, 1990, 77, 291. C. Moller and M. S. Plesset, Phys. Rev., 1934, 46, 0618. D. Maurice and M. Head-Gordon, Mol. Phys., 1999, 96, 1533. M. Head-Gordon, R. J. Rico, M. Oumi and T. J. Lee, Chem. Phys. Lett., 1994, 219, 21. J. Cizek, J. Chem. Phys., 1966, 45, 4256. M. W. Schmidt and M. S. Gordon, Annu. Rev. Phys. Chem., 1998, 49, 233. B. O. Roos, Int. J. Quantum Chem., 1980, 17, 175. K. Andersson, P.-Å. Malmqvist and B. O. Roos, J. Chem. Phys., 1992, 96, 1218. S. R. White, Phys. Rev. Lett., 1992, 69, 2863. W. Kohn and L. J. Sham, Phys. Rev., 1965, 140, 1133. S. F. Sousa, P. A. Fernandes and M. J. Ramos, J. Phys. Chem. A, 2007, 111, 10439. A. Savin, Recent Developments and Applications of Modern Density Functional Theoretical.Elsevier: Amsterdam, 1996, pp 327. E. Fromager, J. Toulouse and H. J. A. Jensen, J. Chem. Phys., 2007, 126. T. Leininger, H. Stoll, H. J. Werner and A. Savin, Chem. Phys. Lett., 1997, 275, 151. J. Heyd and G. E. Scuseria, J. Chem. Phys., 2004, 120, 7274. J. Heyd, G. E. Scuseria and M. Ernzerhof, J. Chem. Phys., 2003, 118, 8207. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, J. Chem. Phys., 1983, 79, 926. A. Warshel and M. Levitt, J. Mol. Biol., 1976, 103, 227. U. Ryde, Method Enzymol, 2016, 577, 119. L. Hu, P. Söderhjelm and U. Ryde, J. Chem. Theory Comput., 2013, 9, 640. T. H. Rod and U. Ryde, J. Chem. Theory Comput., 2005, 1, 1240. T. H. Rod and U. Ryde, Phys. Rev. Lett., 2005, 94, 138302. V. Luzhkov and A. Warshel, J. Comput. Chem., 1992, 13, 199.

33

Acknowledgements Time flies! I have spent almost four years at Lund University. I remember four years ago when I was in China and I was so excited that I can go to Sweden, but at the same time I was also afraid because I did not know how the future life would be. Now, I can give me the answer: I really enjoyed my time here and I have learnt a lot both in research and in life. Here, I am very grateful to you, my friends, and without you I could not have such a great experience. First of all, I would like to thank my supervisor, Ulf Ryde. You are always patient and positive when we discuss our research, and even when I made some stupid mistakes. This means a lot to me. You help me much, not only in my research, but also in my life. I also would like to thank my co-supervisor, Ebbe Nordlander for your inspiring discussions. Although we did not meet very often, every time I talk with you, I get some new ideas for my research. I am thankful to everyone in my group, Paulius, Francesco, Martin, Majda, Lili, Octav, Erik and all visitors. When I came to KC my first time, Paulius brought me to the student centre to help me to apply for the entrance card. Also, you taught me how to install some software on Linux system. Francesco, you were always nice when I asked you some questions and helped me a lot in my research; I miss you. Martin, thank you that you told me a lot about Sweden, and thank you for your help at the programing course. Majda, thank you in many aspects, courses, research, travel and so on. Lili, thank you for your many advices on research and life, e.g., how to treat people, how to improve my appearance, etc. Octav, you are so clever, you know a lot and learn fast, and you help me a lot in my research and courses. Erik, thank you for teaching me DMRG, CAS, and srDFT calculations. I am very happy that I can collaborate with you in my research. I thank all the visitors, Azar, Adrian, Christine, Nadja, Fatemeh, Casper, Adam, Alfonso, Melanie, Meiting, and Anna. I also thank Pär, Soumendranath and Esko. I really enjoy the discussions in our journal club. I am thankful to Kristine, Quan and Simon. I had a nice experience because of you when I was in Leuven. I learnt a lot about DMRG and CCSD(T). Also, I miss that time when we played bowling together. I would like to thank the DMRG group in our department, Valera, Per-Åke, Martin, and Erik. I learnt a lot from you. I am thankful to Fredrik, my officemate during my first year in KC. You helped me a lot, in both research and daily life. I am very grateful to my Chinese friends. Wei, you are my first Chinese friend I met in Sweden. I clearly remember it was at night around 11 o’clock on 15th September in 2014. I called you because I was so hungry, and you brought some bread and apples. In the next day, you brought me to KC. Around 11 in the morning, Fei came to me: “I heard that a Chinese student came here in Ulf’s 34

group”, Fei said to me. Then, you introduced me to Ruiyu. You told me how to get an apartment, how to register a personal number and so on. Days after, I met Weimin at Annil’s party, and then I learnt YouTube from you. This was the first time I heard about YouTube. After that I knew what to do when I was at home. Months after, Hongduo came back from maternity leave, and we had a long talk about how to enjoy my stay here in Sweden. From then on, I met many Chinese friends, Xiaoting, Feifei, Junsheng, Wei Dang, Delin, Bin, Ke, Meina, Lingdong, Qianjin, Junhao, Fenying … I have a lot friends now in Lund more than I can fit here. I appreciate you all very much and because of you, my life becomes more and more beautiful. I would like to thank Ingrid, Helena, and Maria for taking care of us. I remember the first day in KC (the second day in Sweden), Ingrid, you helped me to register my PhD. You are so nice and patient. Helena and Maria, you help me a lot when I had some problems at KC. Finally, I would like to thank my families. My parents, you always support me during the four years. You always said “take care of yourself, your father and me are fine, and do not worry about us”. I could not go further without your supporting. I also thank my elder sister. Because you take care of our parents, I can study in Sweden. My dear wife, Xiaoqiong, thank you for your company in my life, I love you forever. Thank you very much!

Geng Dong

35