Fragmenstein: merging (continued)

matteoferla · June 18, 2020, 7:10pm

This is about merging fragments without a reference SMILES string using Fragmenstein. This is a second post on the topic because my previous one, Fragmenstein: merging, was erroneously in Docking Result category. For placing human suggested followup compounds SMILES against the fragments, see Fragmenstein: assessing fidelty to hits. For placing library enumeration SMILES against a given hit or two see Compound sets to score for how well they can recapitulate fragment mergers.

What is Fragmenstein

Fragmenstein places a given compound based on a subset of the hits or merges hits and then energy minimises them —as these may be horribly distorted (hence the algorithm name). So it is not a docking program.
Reactive compounds are covalent only if the warhead is one of the following: acrylamide, chloroacetamide, nitrile, vinylsulfonamide, bromoalkyne —aurothiol and aldehyde are disabled.

Results

I did several code fixes and did all the pairwise mergers and then merged the top mergers that shared a hit.

Interactive 3D page of results (Michelanglo)
sdf of top 5_000 mergers (3.5 MB)
static html table with 2D images ( => 90 MB table)

How

The hits are merged bases on atoms that uniquely overlap within 2 Å. But …

Rings were a problem, so the rings get collapsed into an virtual atom before merging and then expanded. The bonding to the rings is determined by proximity (or remembered by virtual atom, not great route) —unless it would form an triangle or square— and any atoms that are overly close to the ring get merged (i.e. alkane to benzene).
Merging of warheads is prevented. If different warheads are detected, the first is used, the rest of the hits get their covalent warhead group removed, as a result the merger x0771-x2705 joins at the piperidine.
Non-overlapping hits get joined via N carbons, with a fudge for rings —not projected atm. The carbons are not constrained.
Excessive bonds are corrected by shifting the element leftwards (no oxinium this time!) or reducing bond order. Texas carbon becomes sulfur ATM which is 90% of the time bad.

Issues

The ranking is still an issue and I will make a better weighted model.
The point at which minimisation stops is also not great —the minimisation algorithm .