PostEra

Protonation States of SARS-CoV-2

@frankvondelft suggested I initiate a discussion here about the protonation state of the SARS-CoV-2. Depending on the method, this will influence the kinds of results we get from our modeling.

Regarding protonation states of the catalytic dyad (Cys-145 and His-41), check out this excellent thesis from 2013 by Alexander Paasche, who modeled SARS-CoV Mpro using extensive MD and QM/MM calculations:

OPUS Würzburg | Mechanistic Insights into SARS Coronavirus Main Protease by Computational Chemistry Methods

Paasche says, regarding SARS-CoV Mpro:

“A further remarkable feature of the catalytic site is the protonation state of the Cys/His dyad. Experimental studies on the active-site residues of SARS-CoV Mpro estimate the pKa values of Cys145 in the range of 7.7-8.3 and His41 in the range of 6.2-6.4.[58,71] This means that they rest in an uncharged state with a protonated thiol moiety. Since SARS-CoV Mpro is a cysteine protease, this finding distinguishes it from typical enzymes of this protease class, which normally possess a thiolate/imidazolium ion pair for their catalytic Cys/His dyad.[94–97] However, the existence of exceptions from this general characteristic has been shown few years before SARS-CoV Mpro was discovered at the example of a picornavirus protease that is also a cysteine protease but does not form an ion pair at its catalytic site.[98,99] Interestingly, the fold of picornavirus 3C protease has a close similarity to the serine protease chymotrypsin, which is in line with observations at SARS-CoV Mpro, for which a similar structural relationship is found.[100] There are also more recent examples of cysteine/histidine containing active-sites that reside in an uncharged state.[101] Hence, the paradigma that cysteine proteases usually possess a thiolate/imidazolium ion pair, seems to vanish step by step.[102] The current understanding of the catalytic mechanism indicates that SARS-CoV Mpro exhibits mechanistic features that are quite different to the archetypical proteases papain and chymotrypsin, but also shares common features with them.[58,71] The common enzymatic mechanisms for these two “prototype proteins” are the ion-pair mechanism for papain,[103] and the general-base catalysis for chymotrypsin,[104] respectively. Concerning the peptide cleavage reaction mediated by SARS-CoV Mpro, experimental results indicate that an ion-pair seems clearly to be involved into the mechanism. This is similar to papain, but on the other side the catalytic activity is found to be independent of the chemical reactivity of the substrate.[58] This is counter intuitive, since substrates with increased reactivity should react faster with the thiolate moiety of cysteine and thus proceed faster within the enzyme catalysis cycle. Derived from pKa measurements that are indicative for a general-base mechanism,[71] the existence of an “electrostatic trigger” is postulated[58] that denotes the formation of an ion pair of SARS-CoV Mpro’s catalytic dyad as a first reaction step and seems furthermore the rate-determining step for enzyme catalysis. This hypotheses could conclusively explain why the catalytic activity is independent from the reactivity of the substrate.”

So SARS-CoV Mpro rests in the neutral state, i.e. “Cys-His”. Substrates seem to be able to trigger proton transfer and the formation of the Cys(-)-His(+) zwitterionic catalytic dyad, and more readily than warhead-containing inhibitors that end up covalently bound. Paasche asserts that the coverage of the active site seems to play a role: substrates that have more coverage than inhibitors trigger the zwitterion formation more readily. The example inhibitor studied (my quick reading of the chapter suggests it was “AG77088”, aka “N9” or https://www.rcsb.org/ligand/9IN in 2amd, which is very similar in 2D and 3D to N3 (aka PRD_002214 or https://www.rcsb.org/ligand/PRD_002214 in 6lu7).

For any reaction to occur with a covalent warhead, the thiol proton has to leave at some point, and the neutral imidazole of His-41 stands at the ready to receive it; just before reacting, you would have a thiolate anion and an imidazolium cation…

I think @tdudgeon is right to recommend both forms. My collaborators at Bristol have used both the Cys-His and Cys(-)-His(+) forms, using their docking code, BUDE. I have used the zwitterionic form for my dockings of covalent binders. The neutral form can tell us about the initial “coverage” of the active site by our compounds or modeled substrates.

Regarding “coverage” by substrates of the active site, here are the sequences of the 11 cleavage sites for SARS-CoV and SARS-CoV-2 Mpro — just like the enzymes, the corresponding cleavage sites ("#" column) have very high sequence identity:

SARS-CoV or SARS-CoV-2 Source Accession # P5 P4 P3 P2 P1 P1’ P2’ P3’ P4’ P5’
SARS-CoV GMM/Code NC_004718.3 1 S A V L Q S G F R K
SARS-CoV GMM/Code NC_004718.3 2 G V T F Q G K F K K
SARS-CoV GMM/Code NC_004718.3 3 V A T V Q S K M S D
SARS-CoV GMM/Code NC_004718.3 4 R A T L Q A I A S E
SARS-CoV GMM/Code NC_004718.3 5 A V K L Q N N E L S
SARS-CoV GMM/Code NC_004718.3 6 T V R L Q A G N A T
SARS-CoV GMM/Code NC_004718.3 7 E P L M Q S A D A S
SARS-CoV GMM/Code NC_004718.3 8 H T V L Q A V G A C
SARS-CoV GMM/Code NC_004718.3 9 V A T L Q A E N V T
SARS-CoV GMM/Code NC_004718.3 10 F T R L Q S L E N V
SARS-CoV GMM/Code NC_004718.3 11 Y P K L Q A S Q A W
SARS-CoV-2 GMM/Code MN908947 1 S A V L Q S G F R K
SARS-CoV-2 GMM/Code MN908947 2 G V T F Q S A V K R
SARS-CoV-2 GMM/Code MN908947 3 V A T V Q S K M S D
SARS-CoV-2 GMM/Code MN908947 4 R A T L Q A I A S E
SARS-CoV-2 GMM/Code MN908947 5 A V K L Q N N E L S
SARS-CoV-2 GMM/Code MN908947 6 T V R L Q A G N A T
SARS-CoV-2 GMM/Code MN908947 7 E P M L Q S A D A Q
SARS-CoV-2 GMM/Code MN908947 8 H T V L Q A V G A C
SARS-CoV-2 GMM/Code MN908947 9 V A T L Q A E N V T
SARS-CoV-2 GMM/Code MN908947 10 F T R L Q S L E N V
SARS-CoV-2 GMM/Code MN908947 11 Y P K L Q S S Q A W

I have just completed models of all 11 substrates in the active site of SARS-CoV-2 using the 2q6g PDB entry, which contains an inactivated mutant of SARS-CoV Mpro (H41A) with the most readily cleaved substrate, TSAVLQ*SGFRK. These SARS-CoV-2 + substrate models should give us insight into the relationship between the sequence specificity and the 3D shape of the pocket, plus ideas for new inhibitor designs.

I’m eager to hear everyone’s thoughts on this:

  • How transferable are Paasche’s results for SARS Mpro to SARS-CoV-2 Mpro? (I’m guessing they are similar enough that it is a reasonable assumption, but we all know the subtlest of differences can sometime have large effects.)
  • Does anyone have citations for the experimental pKa values for the Cys and His in SARS-CoV-2 Mpro?
  • To what extent does Paasche’s “coverage hypothesis” hold? From my quick reading of the thesis he looked at one inhibitor (N9, 2amd) and one P6-P5’ substrate (TSAVLQ*SGFRK, 2q6g).

My chemical intuition says we should use the zwitterionic form for covalent inhibitors; and the neutral form for non-covalent inhibitors. Using both forms kills all birds with two stones.

2 Likes

Related to this, I’d like to ask if anyone has done a systematic analysis of conserved waters?
I’ve had a quick look at it does seem that there is one conserved water in the fragment screening structures (I’ve so far looked at only 4 of them) that seems to correspond to the key water that Paasche describes in that it interacts with Arg40, His41, His164 and Asp 187.

What seems particularly interesting is that this water is conserved even in Mpro-x0397, the one fragment hit that comes close to Cys145 and in doing so pushes Cys145 and His41 away. A water is still present in the same location but has lost it’s interaction with the His41 side chain (but not the others or the His41 backbone amine).

Maybe our simulations should be including that water? It seems to be too deep and integral to try to displace it.

Has anyone looked at this in more detail?

BTW it’s water 6 in the structures I’ve looked at (Mpro-x0387, Mpro-x0397, Mpro-x0874, Mpro-x1249).

1 Like

@garrett, can you post those models of the 11 substrates?

Sure; they are somewhat “quick-and-dirty”, and I’m hoping someone will pick them up and run MD on them, and then QM to look at the mechanism.

We can now already look at “coverage” of the active site of substrates versus inhibitors.

The link is on the University’s OneDrive so might not be visible to the outside world. Where should I put them?

You can check them out (you might need to request permission):

https://unioxfordnexus-my.sharepoint.com/:f:/g/personal/dtce0011_ox_ac_uk/EqVyc-9FOE9DjP0pDLTNGCEBGj7VRVufO60mU8mLzD1_tQ

I built energy minimized models of the 11 native cleavage site substrates from P6 to P5’ bound to SARS-CoV-2 Mpro. PDB entry 6lu7 (minus the inhibitor N3) was used as the starting conformation. I only built the substrates into one active site in the dimer (chain A). The modeling was greatly aided by the X-ray crystal structure of an inactivated SARS-CoV Mpro mutant, H41A, bound to what I believe is the fastest-cleaving site, T S A V L Q - S G F R K. This was a great starting point, it turned out. I built the initial substrates using the most common backbone-dependent rotamer for each position. The substrate has an extended conformation in both the crystal structure of SARS-CoV Mpro H41A in 2q6g, in all of my SARS-CoV-2 models, and interestingly, & FWIW, in all HIV-1 PR D25N substrate complexes. The peptidomimetic N3 in 6lu7 also has an extended conformation where it mimics a peptide. I tried to choose the least statically-clashing and most chemically-complementary rotamer in each case, where necessary. The models were then run through MOE’s structure preparation, protonate 3D (at pH 7), and then AMBER10-EHT force field- based minimization until the change in energy was less than 0.1 kcal/mol/iteration. I let the whole dimer plus substrate (plus incidental waters) relax. Most substrates terminated minimization after fewer than 150 iterations, but a couple went for up to about 300 iterations. These were very quick minimizations, simply to eliminate any steric clashes and optimize local H-bond networks.

The exercise of building the models reveals a lot of about the stereochemical constraints of substrate specificity at each position from P6 through P5’. As you might expect, those positions facing “into” the Mpro are much more constrained than those facing “away”. Here is a grid image of the 2q6g crystal structure (SARS-CoV Mpro H41A + T S A V L Q - S G F R K in pink) and the 11 minimized models of SARS-CoV-2 Mpro and the 11 native cleavage sites in orange:

[Image.png]

It’s interesting how the substrate in positions P5-P1 H-bonds alongside a β-sheet in the Mpro, effectively making it one strand ‘wider’.

Note that Protonate 3D treated the His-41 and Cys-145 in SARS-CoV-2 as both neutral at pH 7: I did not force these to be zwitterionic.

This weekend I also sent the list of sequences of the 11 native cleavage sites to Michel Sanner at Scripps Research, for docking to SARS-CoV-2 Mpro with AutoDock CrankPep.

Like I said, I’m hoping Phil Biggin or Fernanda Duarte will use these substrate models as starting points for MD simulations, then Luigi Genovese at CEA could run his DFT methods on the outputs?

It is striking how conserved the P2 and P1 sites are, and what perfect fits they make with both SARS-CoV and SARS-CoV-2 Mpro. In all of my minimized models, the backbone NH H-bonds around the oxyanion hole to the key carbonyl oxygen in the substrate seemed to be preserved (visual inspection).

Paasche, in his 2013 thesis, posits that the greater “coverage” by the substrate of the active site than the inhibitor he studied, led to its greater reduction in the activation barrier for proton transfer from H41 to C145, and thus subsequent formation of the zwitterionic catalytic dyad, His(+)-Cys(-). Presumably we can now answer this question, with MD and either QM/MM or DFT.

All the best,

Garrett

One of Fernanda Duarte’s DPhil students, Tristan, has been looking at conserved waters in the DE Shaw Research trajectory of Mpro— but not across all known Mpro structures. I can raise this at our next Oxford/Bristol/CEA meeting.

It would make sense to use both forms for docking studies. What is the basis for the assignment of pKa values to the cysteine thiol and histidine imidazole in the experimental studies? It may be difficult to do a clear assignment if the thiolate anion and imidazolium cation are hydrogen bonded to each other.

Before docking, we thought that Cys-145 could form a Zwitterion with His-41 during the reaction, so we (Jimmy Stewart) ran a B3LYP calculation using a 6-311G basis set and the default solvation on both the Zwitterion and the neutral form, starting with the geometry of these two residues as they exist in 4MDS. The neutral form changed to the Zwitterionic form. At equilibrium, the S - N distance was 3.16 Angstroms. In the X-ray structure, the environment of these two residues puts the S - N distance at 3.75 Angstroms. The reaction coordinate shows a flat reaction profile, so we agree that more or less everything that Garrett said in the first message, especially: " This means that they rest in an uncharged state with a protonated thiol moiety. " And it seems entirely reasonable that substrate excluding water should shift the proton to a more protonated histidine equilibrium.