We’ve specced out an sdf format for people to contribute their calculated/computed binding poses for moonshot compounds (see below). This format is provided so that within the next couple of weeks, the compound sets can all be easily uploaded to fragalysis (fragalysis.diamond.ac.uk) so that they can be viewed easily alongside the x-ray hits.
Validation script and instructions: https://github.com/xchem/sdf_check
SDF files can be uploaded to this thread if you wish for them to eventually be shared on fragalysis, and for them to be considered for use in triaging designed compounds
Also attached to this post is the html for a Jupyter notebook, with some example rdkit code showing how to achieve the desired format. example_sdf_format-5.html.zip (50.2 KB)
Specification - ver_1.0
The upload format for compounds will be allowed in one of two ways:
- A single sdf file
- A single sdf file plus pdb files for the ligands to be loaded into in fragalysis.
The sdf files for these two options will have a standardised format, to allow the following options:
- The fragments that inspired the design of each molecule can be specified
- The protein (in pdb file format) for each molecule can be specified
- Any number of ‘properties’ or ‘scores’ can be specified.
The (SDF) format is as follows:
The sdf file name will be:
<name> replaced with the name you wish to give it. e.g.
A ‘blank’ molecule will be the first in the sdf:
- This molecule will contain all of the same fields as the sdf, containing a description of those fields.
- The 3D coordinates of this molecule can be anything - they will be ignored.
- The name (title line) of this molecule should be the file format specification version e.g. ver_1.0 (as defined in this document)
Every other molecule in the sdf file will be assumed to be a molecule that is a computed molecule, and should:
- Have the same properties as the blank molecule, but with their values instead of description.
- Have a name that is meaningful, and will eventaully be displayed in Fragalysis - Use the PostEra submission ID, or if not available, a name that is meaningful (e.g. the PDB code, name used in publication, etc.)
- Have the three following compulsary property fields:
- ref_mols - a comma separated list of the fragments that inspired the design of the new molecule (codes as they appear in fragalysis - e.g. x0104_0,x0692_0)
- ref_pdb - either (a) a filepath (relative to the sdf file) to an uploaded pdb file (e.g. Mpro-x0692_0/Mpro-x0692_0_apo.pdb) or (b) the code to the fragment pdb from fragalysis that should be used (e.g. x0692_0)
- original SMILES - the original smiles of the compound before any computation was carried out
example sdf file: compound-set_fragmenstein.sdf (9.9 KB)
NB: only properties with numerical (or boolean) values will be displayed in fragalysis - this will be reviewed at a later date