PostEra

Covalent attachments with "*"

These are not results, but potentially handy data. Namely, I thought it would be handy to share the SMILES of the covalent submissions and hits.

In the GitHub dataset by @mc-robinson , the SMILES are unreacted.

Here is an unofficial list of SMILES of the submissions and the fragment hits where the covalent SMILES is represented with a * (R element in a mol file, dummy atom in RDKit). And actually, this is the “normal” way that fragmented bonds are represented in RDKit.

The SMILES have been converted also into the other three main warheads (‘acrylamide’, ‘chloroacetamide’, ‘vinylsulfonamide’, ‘nitrile’). In which case they have a suffix (_ACR, _CHL, _VIN, _NIT).

These SMILES were obvious to correct:

  • FAR-UNI-736b943a-11, [Cl]CC(=O)N1CC[C@H](NC(=O)Nc2ccc(F)cc2F)[C@H]1C(=O)N(C)CC(N)=O
  • FAR-UNI-9a76d7b5-2, [Cl]CC(=O)N1CC[C@H](NC(=O)Nc2ccc(F)cc2F)[C@H]1C(=O)N(C)CC(N)=O
  • KAT-FAI-78913094-1, Cc1cccc(C2CN(C(=O)C[Cl])CCN2c2ccccc2[N+](=O)O)c1
  • KAT-FAI-9b65c29e-1, Cc1cccc(C2CN(C(=O)C[Cl])CCN2c2ccccc2C#N)c1
  • KAT-FAI-9b65c29e-2, Cc1cccc(C2CN(C(=O)C[Cl])CCN2c2ccccc2N2CCOCC2)c1
    KAT-FAI-9b65c29e-3, CC1CCCC(C2CN(C(=O)C[Cl])CCN2C2CCCCC2N2CCOCC2)C1

However, these here were not converted because I could not discern a reactive group that I recognised —many are like this for example (AAR-POS-dddeddbf-5):

Mol

The only ambiguous one is JON-UIO-cc955e79-8.

The code to do the conversion is in the Fragmenstein repo, via the class Victor.

This is a gutted version of that class that does it:

from rdkit import Chem
from typing import Optional, Union

class Wictor:
    warhead_definitions = [{'name': 'nitrile',
                            'covalent': 'C(=N)*',  # zeroth atom is attached to the rest
                            'covalent_atomnames': ['CX', 'NX', 'CONN1'],
                            'noncovalent': 'C(#N)',  # zeroth atom is attached to the rest
                            'noncovalent_atomnames': ['CX', 'NX']
                            },
                           {'name': 'acrylamide',
                            'covalent': 'C(=O)CC*',  # the N may be secondary etc. so best not do mad substitutions.
                            'covalent_atomnames': ['CZ', 'OZ', 'CY', 'CX', 'CONN1'],
                            # OZ needs to tautomerise & h-bond happily.
                            'noncovalent': 'C(=O)C=C',
                            'noncovalent_atomnames': ['CZ', 'OZ', 'CY', 'CX']},
                           {'name': 'chloroacetamide',
                            'covalent': 'C(=O)C*',  # the N may be secondary etc. so best not do mad substitutions.
                            'covalent_atomnames': ['CY', 'OY', 'CX', 'CONN1'],
                            # OY needs to tautomerise & h-bond happily.
                            'noncovalent': 'C(=O)C[Cl]',
                            'noncovalent_atomnames': ['CY', 'OY', 'CX', 'CLX']
                            },
                           {'name': 'vinylsulfonamide',
                            'covalent': 'S(=O)(=O)CC*',  # the N may be secondary etc. so best not do mad substitutions.
                            'covalent_atomnames': ['SZ', 'OZ1', 'OZ2', 'CY', 'CX', 'CONN1'],  # OZ tauto
                            'noncovalent': 'S(=O)(=O)C=C',
                            'noncovalent_atomnames': ['SZ', 'OZ1', 'OZ2', 'CY', 'CX']
                            }
                           ]

@classmethod
    def make_covalent(cls, smiles: str, warhead_name: Optional[str] = None) -> Union[str, None]:
        """
        Convert a unreacted warhead to a reacted one in the SMILES

        :param smiles: unreacted SMILES
        :param warhead_name: name in the definitions. If unspecified it will try and guess (less preferrable)
        :return: SMILES
        """
        mol = Chem.MolFromSmiles(smiles)
        if warhead_name:
            war_defs = [wd for wd in cls.warhead_definitions if wd['name'] == warhead_name]
        else:
            war_defs = cls.warhead_definitions
        for war_def in war_defs:
            ncv = Chem.MolFromSmiles(war_def['noncovalent'])
            cv = Chem.MolFromSmiles(war_def['covalent'])
            if mol.HasSubstructMatch(ncv):
                x = Chem.ReplaceSubstructs(mol, ncv, cv, replacementConnectionPoint=0)[0]
                return Chem.MolToSmiles(x)
        else:
            return None

    @classmethod
    def make_all_warhead_combinations(cls, smiles: str, warhead_name: str) -> Union[dict, None]:
        """
        Convert a unreacted warhead to a reacted one in the SMILES

        :param smiles: unreacted SMILES
        :param warhead_name: name in the definitions
        :return: SMILES
        """
        mol = Chem.MolFromSmiles(smiles)
        war_def = [wd for wd in cls.warhead_definitions if wd['name'] == warhead_name][0]
        ncv = Chem.MolFromSmiles(war_def['noncovalent'])
        if mol.HasSubstructMatch(ncv):
            combinations = {}
            for wd in cls.warhead_definitions:
                x = Chem.ReplaceSubstructs(mol, ncv, Chem.MolFromSmiles(wd['covalent']),
                                           replacementConnectionPoint=0)
                combinations[wd['name'] + '_covalent'] = Chem.MolToSmiles(x[0])
                x = Chem.ReplaceSubstructs(mol, ncv, Chem.MolFromSmiles(wd['noncovalent']),
                                           replacementConnectionPoint=0)
                combinations[wd['name'] + '_noncovalent'] = Chem.MolToSmiles(x[0])
            return combinations
        else:
            return None

I just realised that I did not add an attachment (the here above is not a link…)

R_group_SMILES.xlsx (384.1 KB)