Providing computed poses for others to look at

reskyner · April 29, 2020, 6:35pm

Specification - ver_1.2

edit: clarification on pdb.zip upload format

The upload format for compounds will be allowed in one of two ways:

A single sdf file
A single sdf file plus pdb files for the ligands to be loaded into in fragalysis.

The sdf files for these two options will have a standardised format, to allow the following options:

The (SDF) format is as follows:

The sdf file name will be: compound-set_<name>.sdf with <name> replaced with the name you wish to give it. e.g. compound-set_fragmenstein.sdf

A ‘blank’ molecule will be the first in the sdf:

This molecule will contain all of the same fields as the sdf, containing a description of those fields.
The 3D coordinates of this molecule can be anything - they will be ignored.
The name (title line) of this molecule should be the file format specification version e.g. ver_1.2 (as defined in this document)
The molecule should have the following compulsory fields:
- ref_url - the url to the forum post that describes the work
- submitter_name - the name of the person submitting the compounds
- submitter_email - the email address of the submitter
- submitter_institution - the submitters institution
- generation_date - the date that the file was generated in format yyyy-mm-dd
- method - a name for the method used to generate the compound poses

NB: all of the compulsory fields for the blank mol can be included for the other molecules, but they will be ignored

Every other molecule in the sdf file will be assumed to be a molecule that is a computed molecule, and should:

Have the same properties as the blank molecule, but with their values instead of description. - for the ref_url field, you can leave this blank, as it will be ignored for molecules that are not the blank molecule
Have a name that is meaningful, and will eventaully be displayed in Fragalysis - Use the PostEra submission ID, or if not available, a name that is meaningful (e.g. the PDB code, name used in publication, etc.)
Have the three following compulsary property fields:
- ref_mols - a comma separated list of the fragments that inspired the design of the new molecule (codes as they appear in fragalysis - e.g. x0104_0,x0692_0)
- ref_pdb - either:
  - the file path of the pdb file in the uploaded zip file:
    - Example: If you upload a file called references.zip that contains a pdb file new_protein.pdb, the corresponding path in the ref_pdb file would be references/new_protein.pdb
  - the code to the fragment pdb from fragalysis that should be used (e.g. x0692_0)
- original SMILES - the original smiles of the compound before any computation was carried out

NB: only properties with numerical (or boolean) values will be displayed in fragalysis - this will be reviewed at a later date