DOCKovalent results

alphalee · April 4, 2020, 8:24am

Summary of DOCKolavent from @londonir :

To try and help visualise the binding modes of the covalent designs, we used DOCKovalent to dock each covalent design to each of the structures you indicated they were based upon.
We took into consideration chloroacetamides, acrylamides, nitrlies and vinyl sulfones.

To do this quickly it is also a bit dirty - try to follow:

In the file smiles.smi you can find the mapping between an arbitrary new ID 1, 2, 3 , etc… to the smiles and the unique design ID e.g. “PET-SGC-fed-1”
Each design was then converted and docked in several versions, and a suffix was added to the ID according to each:
Nitriles (suffix: NIT)
Vinyl Sulfones: (suffix: SUL)
Acrylamides: (suffix: ACR)
Chloroacetamides: (suffix: NCL)
We also amended your designs in two ways:

We ‘made’ in-silico the acrylamide version of each chloroacetamide (suffix: NCL2ACR)
Some of you forgot to add the chlorine and just left an acetyl group, so in such designs we docked them as either chloroacetamides (suffix: ACL) or acrylamides (suffix: ACL2ACR)

We than docked each of these to the structures indicated by you.
In the Files.zip you will find 52 directories each one containing:
rec.pdb - that’s the protein
xtal-lig.pdb - that’s the crystallographic fragment
poses.mol2 - these are the docking results

So for example in Dir: x0104 the first docking pose in the poses.mol2 file is of “38_NCL”
Which corresponds to arbitrary ID 38 which in smiles.smi you can tell means “DAR-DIA-fb2-1“ and the NCL suffix means it was docked as a chloroacetamide.

Note 1: There were plenty of nitriles that weren’t meant to serve as a covalent warhead but were just part of the fragments - these fell victim to our quick and dirty methods and are probably docked wrong.
Note 2: DOCKovalent is built for speed and not so much for accuracy - i.e. it is a screening software rather than a ‘modelling’ software. So I would not pay too much attention to the scores, and even the poses I would take with a grain of salt. If it did recapitulate the binding pose of the fragment the design was built upon - that’s a very positive sign.
Note 3: If you load rec.pdb xtal-lig.pdb and poses.mol2 in pymol - it’s very easy to scroll through the docking results and compare them to the native fragment.

smiles.smi (224.5 KB)
Dock poses: https://drive.google.com/open?id=12beIOwO4pJNs6VzUUj_dpyrIqGGC4Enz

alphalee · April 4, 2020, 8:32am

Summary of shape overlay results from @JohnChodera :

covalent-docking-overlap.{csv,sdf,mol2,pdb} - scored shape overlays and total overlap volumes summed over all fragments, sorted by that sum
docked-with-overlapping-fragments/ - directory of PDB files where the first molecule is the docked molecule and the subsequent molecules in the PDB file are the highly overlapping fragments; look at this file to see how well the docked covalent molecules explore geometries from multiple fragments.

Unfortunately, since many fragments are very similar and bind in nearly identical poses, this means that many of the highest-scored docked poses simply overlap with nearly degenerate fragments. On visual inspection, even this overlap is pretty poor.

https://drive.google.com/a/choderalab.org/file/d/1r8v6Zrpm4x48oF4_QuUPObvEZoBXhn6_/view?usp=drive_web

wmccorkindale · April 4, 2020, 2:04pm

I’ve filtered out the unique covalent designs (the ones that could be read by RDKit anyway) and clustered them by Murcko scaffold; the top 20 scaffolds are shown below and a .csv of the structures and their respective scaffolds can be found here:

https://github.com/wjm41/covid-frag-analysis/blob/master/data/unique_docked_smiles_by_scaffold.csv

Edit: Also the .csv has now been updated to include generic scaffolds:

wmccorkindale · April 4, 2020, 2:38pm

For an alternative view, I’ve clustered the covalent designs by K-medoids also (very similar to K-means but the choice of cluster ‘center’ is a datapoint instead of a mean). The distance measure for the clustering is Tanimoto distance using radius 3 Morgan fingerprints (1024 bits). Cluster medoids and the .csv with clusters indicated are attached:

https://github.com/wjm41/covid-frag-analysis/blob/master/data/kclustered_smiles.csv