Researchers from the University of Oxford, as part of the OpenBind consortium, have published a new dataset and predictive AI model, strengthening the data foundations needed for AI in drug discovery.
Most medicines work by binding to specific disease-related proteins in the body. Predicting which molecules will bind, and how strongly, is a central part of early-stage drug design. Although AI has transformed areas such as protein structure prediction, the impact on predicting how drugs interact with their targets has been more limited, in large part because of the shortage of experimental data on these interactions.
The OpenBind consortium's experimental data, generated using high-throughput pipelines at Diamond Light Source in Oxfordshire, combines automated chemistry, robust binding measurements and crystallography, with the data processed into formats suitable for machine learning.
The new release provides detailed X-ray images of 699 compounds binding to the EV-A71 virus protein, with binding strength measurements for 601 of them – one of the largest public datasets for a single protein target.
