@frankvondelft suggested I initiate a discussion here about the protonation state of the SARS-CoV-2. Depending on the method, this will influence the kinds of results we get from our modeling.
Regarding protonation states of the catalytic dyad (Cys-145 and His-41), check out this excellent thesis from 2013 by Alexander Paasche, who modeled SARS-CoV Mpro using extensive MD and QM/MM calculations:
Paasche says, regarding SARS-CoV Mpro:
“A further remarkable feature of the catalytic site is the protonation state of the Cys/His dyad. Experimental studies on the active-site residues of SARS-CoV Mpro estimate the pKa values of Cys145 in the range of 7.7-8.3 and His41 in the range of 6.2-6.4.[58,71] This means that they rest in an uncharged state with a protonated thiol moiety. Since SARS-CoV Mpro is a cysteine protease, this finding distinguishes it from typical enzymes of this protease class, which normally possess a thiolate/imidazolium ion pair for their catalytic Cys/His dyad.[94–97] However, the existence of exceptions from this general characteristic has been shown few years before SARS-CoV Mpro was discovered at the example of a picornavirus protease that is also a cysteine protease but does not form an ion pair at its catalytic site.[98,99] Interestingly, the fold of picornavirus 3C protease has a close similarity to the serine protease chymotrypsin, which is in line with observations at SARS-CoV Mpro, for which a similar structural relationship is found.[100] There are also more recent examples of cysteine/histidine containing active-sites that reside in an uncharged state.[101] Hence, the paradigma that cysteine proteases usually possess a thiolate/imidazolium ion pair, seems to vanish step by step.[102] The current understanding of the catalytic mechanism indicates that SARS-CoV Mpro exhibits mechanistic features that are quite different to the archetypical proteases papain and chymotrypsin, but also shares common features with them.[58,71] The common enzymatic mechanisms for these two “prototype proteins” are the ion-pair mechanism for papain,[103] and the general-base catalysis for chymotrypsin,[104] respectively. Concerning the peptide cleavage reaction mediated by SARS-CoV Mpro, experimental results indicate that an ion-pair seems clearly to be involved into the mechanism. This is similar to papain, but on the other side the catalytic activity is found to be independent of the chemical reactivity of the substrate.[58] This is counter intuitive, since substrates with increased reactivity should react faster with the thiolate moiety of cysteine and thus proceed faster within the enzyme catalysis cycle. Derived from pKa measurements that are indicative for a general-base mechanism,[71] the existence of an “electrostatic trigger” is postulated[58] that denotes the formation of an ion pair of SARS-CoV Mpro’s catalytic dyad as a first reaction step and seems furthermore the rate-determining step for enzyme catalysis. This hypotheses could conclusively explain why the catalytic activity is independent from the reactivity of the substrate.”
So SARS-CoV Mpro rests in the neutral state, i.e. “Cys-His”. Substrates seem to be able to trigger proton transfer and the formation of the Cys(-)-His(+) zwitterionic catalytic dyad, and more readily than warhead-containing inhibitors that end up covalently bound. Paasche asserts that the coverage of the active site seems to play a role: substrates that have more coverage than inhibitors trigger the zwitterion formation more readily. The example inhibitor studied (my quick reading of the chapter suggests it was “AG77088”, aka “N9” or https://www.rcsb.org/ligand/9IN in 2amd, which is very similar in 2D and 3D to N3 (aka PRD_002214 or https://www.rcsb.org/ligand/PRD_002214 in 6lu7).
For any reaction to occur with a covalent warhead, the thiol proton has to leave at some point, and the neutral imidazole of His-41 stands at the ready to receive it; just before reacting, you would have a thiolate anion and an imidazolium cation…
I think @tdudgeon is right to recommend both forms. My collaborators at Bristol have used both the Cys-His and Cys(-)-His(+) forms, using their docking code, BUDE. I have used the zwitterionic form for my dockings of covalent binders. The neutral form can tell us about the initial “coverage” of the active site by our compounds or modeled substrates.
Regarding “coverage” by substrates of the active site, here are the sequences of the 11 cleavage sites for SARS-CoV and SARS-CoV-2 Mpro — just like the enzymes, the corresponding cleavage sites ("#" column) have very high sequence identity:
SARS-CoV or SARS-CoV-2 | Source | Accession | # | P5 | P4 | P3 | P2 | P1 | P1’ | P2’ | P3’ | P4’ | P5’ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SARS-CoV | GMM/Code | NC_004718.3 | 1 | S | A | V | L | Q | S | G | F | R | K |
SARS-CoV | GMM/Code | NC_004718.3 | 2 | G | V | T | F | Q | G | K | F | K | K |
SARS-CoV | GMM/Code | NC_004718.3 | 3 | V | A | T | V | Q | S | K | M | S | D |
SARS-CoV | GMM/Code | NC_004718.3 | 4 | R | A | T | L | Q | A | I | A | S | E |
SARS-CoV | GMM/Code | NC_004718.3 | 5 | A | V | K | L | Q | N | N | E | L | S |
SARS-CoV | GMM/Code | NC_004718.3 | 6 | T | V | R | L | Q | A | G | N | A | T |
SARS-CoV | GMM/Code | NC_004718.3 | 7 | E | P | L | M | Q | S | A | D | A | S |
SARS-CoV | GMM/Code | NC_004718.3 | 8 | H | T | V | L | Q | A | V | G | A | C |
SARS-CoV | GMM/Code | NC_004718.3 | 9 | V | A | T | L | Q | A | E | N | V | T |
SARS-CoV | GMM/Code | NC_004718.3 | 10 | F | T | R | L | Q | S | L | E | N | V |
SARS-CoV | GMM/Code | NC_004718.3 | 11 | Y | P | K | L | Q | A | S | Q | A | W |
SARS-CoV-2 | GMM/Code | MN908947 | 1 | S | A | V | L | Q | S | G | F | R | K |
SARS-CoV-2 | GMM/Code | MN908947 | 2 | G | V | T | F | Q | S | A | V | K | R |
SARS-CoV-2 | GMM/Code | MN908947 | 3 | V | A | T | V | Q | S | K | M | S | D |
SARS-CoV-2 | GMM/Code | MN908947 | 4 | R | A | T | L | Q | A | I | A | S | E |
SARS-CoV-2 | GMM/Code | MN908947 | 5 | A | V | K | L | Q | N | N | E | L | S |
SARS-CoV-2 | GMM/Code | MN908947 | 6 | T | V | R | L | Q | A | G | N | A | T |
SARS-CoV-2 | GMM/Code | MN908947 | 7 | E | P | M | L | Q | S | A | D | A | Q |
SARS-CoV-2 | GMM/Code | MN908947 | 8 | H | T | V | L | Q | A | V | G | A | C |
SARS-CoV-2 | GMM/Code | MN908947 | 9 | V | A | T | L | Q | A | E | N | V | T |
SARS-CoV-2 | GMM/Code | MN908947 | 10 | F | T | R | L | Q | S | L | E | N | V |
SARS-CoV-2 | GMM/Code | MN908947 | 11 | Y | P | K | L | Q | S | S | Q | A | W |
I have just completed models of all 11 substrates in the active site of SARS-CoV-2 using the 2q6g PDB entry, which contains an inactivated mutant of SARS-CoV Mpro (H41A) with the most readily cleaved substrate, TSAVLQ*SGFRK. These SARS-CoV-2 + substrate models should give us insight into the relationship between the sequence specificity and the 3D shape of the pocket, plus ideas for new inhibitor designs.
I’m eager to hear everyone’s thoughts on this:
- How transferable are Paasche’s results for SARS Mpro to SARS-CoV-2 Mpro? (I’m guessing they are similar enough that it is a reasonable assumption, but we all know the subtlest of differences can sometime have large effects.)
- Does anyone have citations for the experimental pKa values for the Cys and His in SARS-CoV-2 Mpro?
- To what extent does Paasche’s “coverage hypothesis” hold? From my quick reading of the thesis he looked at one inhibitor (N9, 2amd) and one P6-P5’ substrate (TSAVLQ*SGFRK, 2q6g).
My chemical intuition says we should use the zwitterionic form for covalent inhibitors; and the neutral form for non-covalent inhibitors. Using both forms kills all birds with two stones.