PostEra

An overview of duplicate designs to this point

Just went through and gathered all of the molecules that were submitted more than once. Interesting list (you may have to zoom in or download the png file!)

This molecule for example was chosen by 5 different people!

ELE-IMP-dfb
SAM-UNK-903
PET-SGC-f81
TAM-UNI-c14
SIM-DE -265

The csv detailing all of the duplicated compounds can be found here https://github.com/mc-robinson/COVID_moonshot_submissions/blob/master/duplicate_designs.csv

Hi, I have been keeping track of duplicates as part of my own efforts to develop new candidates, and as of today, I count 3,722 submissions, and 186 duplicates. I first convert the submission SMILES to an rdkit Mol, and then call rdkit’s Chem.CanonSmiles() function to generate a canonicalized SMILES for the submission. I notice that there are duplicates within a submission sometimes. I have attached a CSV file with the duplicates and the CIDs that are associated with them (as a .zip file, to get the upload to work).
postera-covid-submission-duplicates-2020-04-06.csv.zip (4.3 KB)

Hi @LAntunes, thanks. I believe most of the duplicates in yours are from users who accidentally submitted the same molecule twice. I filtered these out. And I did actually remember to canonicalize though I often forget.