Topic automatically created for discussing the designs at:
https://covid.postera.ai/covid/submissions/4e090d3a-49b1-4b1e-b7f2-30bbdcddc86a
All compounds are from enamine REAL
Methods: https://docs.google.com/document/d/1IZIsCPIoNvEXNFxqCvPKpFO3_Xu_xxR5BqfIj_jTWBE
Data: https://drive.google.com/drive/folders/1h-2u-tgSUvPn5tIbdC7QIb0m5vOqXbxp
Let me know if anything else is needed!
Ahh thanks so much for this! Also, I just grabbed the submission data and couldn’t believe how many more compounds were in Enamine REAL than the last time I checked – this explains a lot!
I noticed in the list:
That some of them did not have a real space id, but in fact all of them should, as mentioned in the notes (see below). In principle they are ordered in the same way as the submission. Please let me know if anything else is needed (e.g. id to enamine_id mapping)
Other Notes:
Please contact me in case anything is needed. 68 Enamine IDs: PV-002404434570 PV-002459258314 Z1644866655 PV-002382989128 Z1518784792 PV-002238895725 PV-002109403469 Z3089609880 Z1374382617 Z1232257154 Z2172190193 Z3052559442 PV-001808920825 Z2143174968 PV-002412555578 Z1984028314 Z2311005603 PV-001481037714 Z1613960994 Z2915108704 Z2599591955 Z2229098908 Z2190256476 Z1549127951 PV-001480409345 PV-001857713429 Z3140017128 Z2860362360 Z1884524160 Z2771389038 Z1190447759 PV-002457915896 Z2719990570 Z2305221270 Z1937966465 Z2079085604 Z2629852456 PV-002162835997 PV-002121778426 PV-001847865278 Z1673773262 Z1935637461 Z433322264 Z296475158 PV-001960510925 Z1647177193 PV-001950395335 PV-002530906640 Z2312116536 Z3092563476 Z2167681427 Z2911721571 PV-002225479888 Z890182422 Z2313212158 Z1549140381 Z1670099933 PV-001481110633 Z1931403851 Z2708549355 Z1260022281 Z1243953601 PV-001819549722 PV-001890951694 Z2330806997 PV-001480406703 Z1177593728 Z1693576546
Thanks for pointing this out, and sorry for the issue! The problem seems to be that the ones without the relevant ID are not recognized as in REAL space by the Enamine API (or https://www.enaminestore.com/search) which I am calling in high throughput. I believe this may be related to the issue mentioned here Submission VIJ-CYC-1a3, which I am working with @Franca to resolve.
Can I ask where you get REAL space files for docking from? Is it BioSolveIT navigator, Enamine, or somewhere else?
I’ll try to sort this all out today!
Thanks; yes these come from a prepared shape db (shape-gpu schrodinger software).
It could be that the ligand is prepared (tautomer/chirality/protonation) but I thought I stripped that information using pipeline pilot. I guess if we would stick to the same way of standardizing molecules as Enamine we would find them? Let me know if you need anything.
Cheers,
Thanks. Ok, trying to really pinpoint what’s going on here to make triaging pipeline robust for future rounds/iterations with Enamine.
Let’s take BAR-COM-4e0-4 as an example. If I go to https://www.enaminestore.com/search and search that compound by similarity, I do not find that exact compound, but get a lot of very, very similar compounds. In fact, I think almost every permutation of n and s in the 5 member ring must exist except for the one you want!
Since we are searching by similarity, even if tautomer/chirality/protonation issues exist and Enamine is not normalizing them, we should still see that compound… unless the similarity score is very, very sensitive…
Which it seems to be! Searching by substructure gives your compound. Ok, that seems to clear things up, though is quite confusing.
Thanks for the example.
Ok, great, yes depending on the molecule, underlying method and standardization similarity can be quite sensitive. Is it perhaps possible to search the enamine REAL api by InchiKey? (stripping chirality/charge etc.) I noticed I was able to merge my set after clearing the stereo on Inchikey.
I guess these changes will get propagated to the next submissions overview?
In any case I attached an excel with the ids, smiles (from enamine REAL), I also noticed I actually accidentally duplicated a molecule: BAR-COM-4e0-40 == BAR-COM-4e0-41. (I copy pasted the smiles…)hits.xlsx (24.0 KB)