Submission JAR-IMP-b007c7c2

mc-robinson · May 6, 2020, 7:26pm

Topic automatically created for discussing the designs at:
https://covid.postera.ai/covid/submissions/JAR-IMP-b007c7c2

mc-robinson · May 6, 2020, 7:31pm

Hi @jarvist, I am a bit concerned that many of these structures are unstable or have many undesirable groups

jarvist · May 6, 2020, 7:38pm

I’ve quite intentionally NOT post-processed these in any way, really they’re here as the starting point of a collaboration with any chemists who’d like to try and smooth off the edges.

Though if you are scoring entirely by those automatic filters that now appear when you submit, I should note that inputting 2011-Akaji-35 (an established 98 nM reversible inhibitor against SARS-CoV(-1), which all of these structures are attempting to mimic) results in 9 violations! (Mostly around the active aldehyde.)

jarvist · May 6, 2020, 7:47pm

I thought I’d copy+paste the submission text into the comment stream to make it easier to read.

These structures were generated automatically using a Graph-Based Genetic-Algorithm (GA), which attempted to build mimic molecules of a reference structure. The reference structure was compound 35 from 2011Akaji, which is a tetrapeptide transition state mimic for SARS-CoV(-1), with an IC50 of 98 nM [1]. The intent of this work-package was to try and find some non-peptide mimics.

The generative part of the method is based on Jan Jensen’s python GB-GA (https://github.com/jensengroup/GB-GA), but with a custom similarly metric that provides a smooth continuous scoring including chemical specificity between 3D structures. Each step of the GA evaluated 42 conformers (generated with open-babel, scored with RMSD) of the trial compound, then maximised the 3D chemical overlap of these conformers against the reference molecule in a single 3D pose. The highest conformer score was used in the GA. The algorithm takes approximately 5 cpu seconds per molecule, which is mostly linear in the number of conformers that are checked.

18145 separate GA streams with a population size of 50 and 100 generations were seeded with the MPro-XChem hits (i.e. all fragments) and the first 1000 molecules of ZINC, and scored with a purely electrostatic chemical similarity. The elite (highest scoring) structures from these runs were then used to seed a second round of 9525 GA streams, where the metric additionally included a score of vdW dispersion chemical specificity (scored at the best electrostatic match), where these multiple objectives were scalarised by taking a weighted geometric mean (electrostatic^0.8 * dispersion^0.2). So the molecules here are the best of ~138 million evaluated molecules, after 200 generations of evolution.

The final outputs were simply ranked by the scoring function, and approx. 30% discarded by a simple RDKIT ‘problematic group’ filter. Clearly the algorithm has been rather keen to put lots of heteroatoms (O,N,S) in place, in order to reproduce a chemical similarity to the peptide backbone. No analysis of stability was made.

No chemist has adjusted or post-processed these structures. Clearly much is missing from the metric - and any suggestions on how to ‘fix’ a structure with some obviously problematic hetero atoms / groups would be greatly appreciated! These could then be easily re-scored, to see whether the metric still thought they were similar.

This work was done in collaboration with Kuano Ltd, and used computer time on the Imperial College Research Computing Service, DOI: 10.14469/hpc/2232.

The overall aim of this work is to characterise the transition states of the actual substrates of 3CL-pro, and then use this GB-GA generative method, and a suitably developed metric to direct suggest transition state analogues. Future work will be to develop the metrics to make closer analogues, and with Kuano to build a robust platform with more sophisticated synthesis likelihood & stability filters.

[1] Akaji, K., Konno, H., Mitsui, H., Teruya, K., Shimamoto, Y., Hattori, Y., … Sanjoh, A. (2011). Structure-Based Design, Synthesis, and Evaluation of Peptide-Mimetic SARS 3CL Protease Inhibitors. Journal of Medicinal Chemistry, 54(23), 7962–7973. https://doi.org/10.1021/jm200870n

mc-robinson · May 6, 2020, 11:28pm

Hi @jarvist, rest assured, the alerts are merely warnings for things to look out for. No molecule is automatically discarded based on these. Obviously, a lot of known drugs violate many of the alerts. I think the broader issue is that the compounds have groups that are unstable / very very unfamiliar to me. The alerts won’t catch these because they flag common groups in structures.

But sure, as a starting point for collaboration, that is fine.