Area to discuss modeling and results of comparisons between different Coronavirus MPro’s relevance to both finding new inhibitors and a pan-CoV drug target.
Here is some sequence alignment work I did the other night. I knew the main protease was pretty similar between SARS-CoV and SARS-CoV-2 but I didn’t know if any of the residue differences matter so I did the alignment and checked the first shell of active site residues. Main finding: In addition to 96% identity, all active site residues are strictly conserved (except for S46A on the outer periphery of S1). This means that, to a first approximation, all the prior art on SARS-CoV main protease (more than a decade!) would likely apply, plug and play, to SARS-CoV-2. Also, a drug would likely be pan-specific for SARS-type coronaviruses (and maybe even more broadly).
Mon Apr 13 22:37:55 MDT 2020
Christopher I. Bayly
We know the main protease (3CLpro) is nearly identical between SARS-CoV and
SARS-CoV-2, but how identical are they and where are the differences? Are there
any important differences in the active site? If not, then all the prior art
of SARS-CoV main protease would likely apply, plug and play, to SARS-CoV-2.
Fasta sequences of SARS-CoV-2 main protease:
>YP_009725301.1 3C-like proteinase [Severe acute respiratory syndrome coronavirus 2]
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQA
GNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSF
LNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYA
AVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRT
ILGSALLEDEFTPFDVVRQCSGVTFQ
>YP_009742612.1 3C-like proteinase [Severe acute respiratory syndrome coronavirus 2]
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQA
GNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSF
LNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYA
AVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRT
ILGSALLEDEFTPFDVVRQCSGVTFQ
Fasta sequences of SARS-CoV main protease:
>sp|P0C6X7.1|R1AB_CVHSA RecName: Full=3C-like proteinase; Short=3CL-PRO; Short=3CLp;
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDML
NPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG
SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDRQT
AQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIA
VLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTFQ
>sp|P0C6U8.1|R1A_CVHSA RecName: Full=3C-like proteinase; Short=3CL-PRO; Short=3CLp;
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDML
NPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG
SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDRQT
AQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIA
VLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTFQ
>sp|P0C6T7.1|R1A_BCRP3 RecName: Full=3C-like proteinase; Short=3CL-PRO; Short=3CLp;
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNP
NYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNGSP
SGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTEVHAGTDLEGKFYGPFVDRQTAQ
AAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVL
DMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTFQ
>sp|P0C6F8.1|R1A_BCHK3 RecName: Full=3C-like proteinase; Short=3CL-PRO; Short=3CLp;
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVVCTAEDMLNPNYDD
LLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNGSPSGVY
QCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDRQTAQAAGT
DTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCA
ALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTFQ
>sp|P0C6F5.1|R1A_BC279 RecName: Full=3C-like proteinase; Short=3CL-PRO; Short=3CLp;
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVIC
TAEDMLNPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSV
LACYNGSPSGVYQCAMRPNYTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGP
FVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLS
AQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTFQ
Which SARS-CoV sequence should I use for alignment?
To select one representative SARS-CoV sequeunce, a clustal omega run
was done on the five SARS-CoV sequences above:
CLUSTAL O(1.2.4) multiple sequence alignment
sp|P0C6F8.1|R1A_BCHK3 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVVCTAEDMLNPNYDDLLIR 60
sp|P0C6F5.1|R1A_BC279 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR 60
sp|P0C6X7.1|R1AB_CVHSA SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR 60
sp|P0C6U8.1|R1A_CVHSA SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR 60
sp|P0C6T7.1|R1A_BCRP3 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR 60
******************************************:***********:*****
sp|P0C6F8.1|R1A_BCHK3 KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
sp|P0C6F5.1|R1A_BC279 KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
sp|P0C6X7.1|R1AB_CVHSA KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
sp|P0C6U8.1|R1A_CVHSA KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
sp|P0C6T7.1|R1A_BCRP3 KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
************************************************************
sp|P0C6F8.1|R1A_BCHK3 SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGK 180
sp|P0C6F5.1|R1A_BC279 SPSGVYQCAMRPNYTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGK 180
sp|P0C6X7.1|R1AB_CVHSA SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGK 180
sp|P0C6U8.1|R1A_CVHSA SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGK 180
sp|P0C6T7.1|R1A_BCRP3 SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTEVHAGTDLEGK 180
*************:*********************************** **********
sp|P0C6F8.1|R1A_BCHK3 FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
sp|P0C6F5.1|R1A_BC279 FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
sp|P0C6X7.1|R1AB_CVHSA FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
sp|P0C6U8.1|R1A_CVHSA FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
sp|P0C6T7.1|R1A_BCRP3 FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
************************************************************
sp|P0C6F8.1|R1A_BCHK3 PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
sp|P0C6F5.1|R1A_BC279 PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
sp|P0C6X7.1|R1AB_CVHSA PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
sp|P0C6U8.1|R1A_CVHSA PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
sp|P0C6T7.1|R1A_BCRP3 PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
************************************************************
sp|P0C6F8.1|R1A_BCHK3 SGVTFQ 306
sp|P0C6F5.1|R1A_BC279 SGVTFQ 306
sp|P0C6X7.1|R1AB_CVHSA SGVTFQ 306
sp|P0C6U8.1|R1A_CVHSA SGVTFQ 306
sp|P0C6T7.1|R1A_BCRP3 SGVTFQ 306
******
Of the five, only two maintained the representative sequence throughout:
sp|P0C6X7.1|R1AB_CVHSA
sp|P0C6U8.1|R1A_CVHSA
the latter was chosen to be used in comparison with SARS-CoV-2
Which SARS-CoV-2 sequence should I use for alignment?
To select one of the two SARS-CoV-2 sequences above, the two
were compared in clustal omega:
CLUSTAL O(1.2.4) multiple sequence alignment
YP_009725301.1 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIR 60
YP_009742612.1 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIR 60
************************************************************
YP_009725301.1 KSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNG 120
YP_009742612.1 KSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNG 120
************************************************************
YP_009725301.1 SPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGN 180
YP_009742612.1 SPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGN 180
************************************************************
YP_009725301.1 FYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
YP_009742612.1 FYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
************************************************************
YP_009725301.1 PLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQC 300
YP_009742612.1 PLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQC 300
************************************************************
YP_009725301.1 SGVTFQ 306
YP_009742612.1 SGVTFQ 306
******
The two were everywhere identical so the first was chosen for alignment with SARS-CoV.
Alignment of 3CLpro (main protease) between SARS-CoV-2 and SARS-CoV
SARS-CoV-2 sequence YP_009725301.1 was aligned with SARS-CoV sequence sp|P0C6U8.1|R1A_CVHSA
in clustal omega with default settings:
CLUSTAL O(1.2.4) multiple sequence alignment
YP_009725301.1 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIR 60
sp|P0C6U8.1|R1A_CVHSA SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR 60
**********************************.**********:**************
|------------------------|--------------|----|--|-----------
YP_009725301.1 KSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNG 120
sp|P0C6U8.1|R1A_CVHSA KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG 120
****.********************:*:*****:**************************
------------------------------------------------------------
YP_009725301.1 SPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGN 180
sp|P0C6U8.1|R1A_CVHSA SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGK 180
*************.*********************************************:
-------------------|||||------------------||||||------------
YP_009725301.1 FYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
sp|P0C6U8.1|R1A_CVHSA FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE 240
*********************:**************************************
------||||-|------------------------------------------------
YP_009725301.1 PLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQC 300
sp|P0C6U8.1|R1A_CVHSA PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC 300
**************************:*****************::**************
------------------------------------------------------------
YP_009725301.1 SGVTFQ 306
sp|P0C6U8.1|R1A_CVHSA SGVTFQ 306
******
------
I added a line underneath each clustal omega line with a vertical bar for residues in
the first shell of the active site, dashes otherwise. The sequences are 96% identical
with only 11 residue differences. Active site residues are strictly conserved except
for S46A on the outer periphery of S1. This means that, to a first approximation,
all the prior art of SARS-CoV main protease would likely apply, plug and play, to SARS-CoV-2.
@bayly, this is really excellent work. Thanks so much! Excellent addition to the long thread of info we have here A brief exploration of past SARS small-molecule inhibitors
One interesting point from the nature paper last week is that the similarity extends beyond just SAR-CoV and SAR-CoV-2 – the active site seems to be quite similar across 12 CoV strains.
Additionally, from a comment to me from @RGlen, he seemed to mostly agree noting the mutation at 46 may be important:
“”"
The backbones show very similar structures. Looking at the difference between SARS and Covid-19 protease x-rays, Mutation of A to S46 pushes M49 into the binding site. Will need to check if this is going to block the inhibitors of SARS. Met-49 is now very close to where the inhibitor would be.
“”"
In fact, the conformation of S46/M49 can be changed quite dramatically, as illustrated in x0354, see snapshot here:
https://fragalysis.diamond.ac.uk/viewer/react/snapshot/03767005-3abb-4147-b7fc-5fa7ff0c2d2c
So I’m guessing this would not affect pan-CoV activity
https://www.biorxiv.org/content/10.1101/2020.02.27.968008v2.full.pdf . This is a rather harsh comparison of SARS-CoV and SARS-CoV-2 Mpro in terms of dynamics and how that could relate to drugability. Although the site is very plastic, I still think we can find decent inhibitors. There might be a selectivity issue though for other proteases, including endogenous ones, which tend to be promiscuous. More flexibility does of course mean less binding affinity usually, so achieving sub 10nM will be challenging.
(Sorry - I should have shared this earlier! I did this back in March, but it was buried in a Zotero note: https://www.zotero.org/groups/2466960/sars-cov-2-transitionstate/collections/Q2DDRX2T/items/GF4F46L4/note/FDH3AI44/collection .)
I’m working on transition-state analogue design, so wanted to know what the peptide specificity of the SARS-CoV-2 3CL-pro is. An interesting study on SARS-CoV(-1) tries all tetra-peptide sequences in-vitro with a fluorescent leaving group[1], to try and collect per-pocket statistics.
Alternatively, one can use bioinformatics to look at the genome cleavage points. The paper I found [2] had a (still working, from 2003 - though now it appears to be offline?) online-service specifically targeted to Corona viruses.
From their paper, SARS-CoV-1 (BJ01) AY278488
2 BJ01 246 782 1 179 179 – PCP CP1
783 2 699 180 818 639 TRELNG GAVTRY PCP CP2
2 700 8 465 819 2 740 1 922 FRLKGG APIKGV PCP CP3
8 466 9 965 2 741 3 240 500 ISLKGG KIVSTC PCP CP4
9 966 10 883 3 241 3 546 306 AVLQ SGFR nsp2
10 884 11 753 3 547 3 836 290 VTFQ GKFK nsp3
11 754 12 002 3 837 3 919 83 ATVQ SKMS nsp4
12 003 12 596 3 920 4 117 198 ATLQ AIAS nsp5
12 597 12 935 4 118 4 230 113 VKLQ NNEL nsp6
12 936 13 352 4 231 4 369 139 VRLQ AGNA nsp7
13 353 16 147 4 370 5 301 932 PLMQ SADA nsp9
16 148 17 950 5 302 5 902 601 TVLQ AVGA nsp10
17 951 19 531 5 903 6 429 527 ATLQ AENV nsp11
19 532 20 569 6 430 6 775 346 TRLQ SLEN nsp12
19 736 20 638 6 481 6 781 301 PQLQ ASEW nsp13
SARS-CoV-2 (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta), processed through ZCURVE_CoV 2.1 (Full output: https://www.zotero.org/groups/2466960/sars-cov-2-transitionstate/collections/Q2DDRX2T/items/GF4F46L4/note/FDH3AI44/collection )
266 805 1 180 180 - PCP CP1
806 2719 181 818 638 RELNGG|AYTRYV PCP CP2
2720 8554 819 2763 1945 FTLKGG|APTKVT PCP CP3
8555 10054 2764 3263 500 IALKGG|KIVNNW PCP CP4
10055 10972 3264 3569 306 AVLQ|SGFR nsp2
10973 11842 3570 3859 290 VTFQ|SAVK nsp3
11843 12091 3860 3942 83 ATVQ|SKMS nsp4
12092 12685 3943 4140 198 ATLQ|AIAS nsp5
12686 13024 4141 4253 113 VKLQ|NNEL nsp6
13025 13441 4254 4392 139 VRLQ|AGNA nsp7
13442 16236 4393 5324 932 PMLQ|SADA nsp9
16237 18039 5325 5925 601 TVLQ|AVGA nsp10
18040 19620 5926 6452 527 ATLQ|AENV nsp11
19621 20658 6453 6798 346 TRLQ|SLEN nsp12
20659 21552 6799 7096 298 PKLQ|SSQA nsp13
So very very similar (nsp9 PMLQ->PLMQ, though that could even be a typo in the original paper, and nsp13 PQLQ->PKLQ ).
This implies that the the SARS-CoV-2 3CL-pro is extremely similar in activity to the previous one.
These peptide sequences might be useful to others who are looking for substrate transition states as a target. I’ve been doing some Autodock Vina + subselection of poses work towards this end here: https://github.com/QuantumCorona/SARS-CoV2-OligopeptideTS/tree/master/0009-gen_dock_peptides
[1] Goetz, D. H., Choe, Y., Hansell, E., Chen, Y. T., McDowell, M., Jonsson, C. B., … Craik, C. S. (2007). Substrate Specificity Profiling and Identification of a New Class of Inhibitor for the Major Protease of the SARS Coronavirus,. Biochemistry , 46 (30), 8744–8752. https://doi.org/10.1021/bi0621415
[2] Gao, F., Ou, H.-Y., Chen, L.-L., Zheng, W.-X., Zhang, C.-T. (2003). Prediction of proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing SARS-CoV genomes. FEBS Letters , 553 (3), 451–456. https://doi.org/10.1016/S0014-5793(03)01091-3
Is there a consensus about whether there is a realistic chance of designing a selective SARS-CoV-2 inhibitor? Should the submitted molecules be reviewed and priorities decided based on selectivity? Does any-one know if the protease inhibitor approved drugs have been tested for Covid-19 therapeutic benefits?
Some really nice work from these folks - https://www.researchgate.net/publication/341093467_Broad-spectrum_inhibition_of_coronavirus_main_and_papain-like_proteases_by_HCV_drugs
Resulting in a new crystal - https://www.rcsb.org/structure/6WNP