Autonomous design of chemical probes by exploiting robotic chemistry, high throughput structural biology and sensor-based biophysics
LEAD SUPERVISOR: Prof Frank von Delft, Nuffield Department of Medicine
Co-supervisor: Prof Charlotte Deane, Department of Statistics
Commercial partner: PostEra Ltd, London
This project aims to realise an algorithmic formalism that helps achieves a key challenge in chemical biology: rapid design of bioactive compounds suitable as chemical probes and drug leads. Such compounds are in turn foundational to multiple MRC priorities, including anti-microbial resistance, advanced therapies, precision medicine, and mental health.
We will develop a machine learning approach that iteratively integrates experimental data from low-cost robotic organic synthesis, high-throughput crystallography (XChem), and rapid sensor-based biophysical measurements (Grating-Coupled Interferometry). The software engine will be able to suggest new molecules that are potent, synthetically tractable and have good pharmacological properties.
This approach builds on methodological discoveries made in the successful COVID Moonshot initiative, which Dr Lee and Prof von Delft co-founded. Starting with fragment screening experiments from the latter’s XChem facility at Diamond, it delivered preclinical antiviral candidates against SARS-CoV-2 main protease in 1.5 years with <£1m [1].
Recent work by Prof von Delft demonstrates the feasibility using robotic synthesis as part of a fragment-based probe discovery campaign. The key insight is how to use biophysical assays and crystallography to analyse crude reaction mixtures [2]. This sidesteps purification, the rate-limiting step in chemical synthesis: protein crystallography directly confirms the ligand’s chemical structure, while sensor-based biophysics provides binding kinetics.
Dr Lee and PostEra Ltd bring the machine learning angle, having developed industry-leading synthesis planning methodologies [3] and synthesis-driven molecular design algorithms, validated in the recent discovery of SARS-CoV-2 main protease inhibitors [4].
This project will address the two interrelated questions required to go from these proof-ofconcept successes to a platform for routine drug discovery: how can machine learning use structural biology to suggest new molecules; and how to extract signal from noise in biophysical assays of crude reaction mixtures.
To learn from structural biology data, we will explore a range of computational approaches to encoding 3D information of protein-ligand complexes, and comparing them with assay data, by both supervised and semi-supervised approaches. The large increase in experimental 3D information that is being engineered in the von Delft group, will allow a new class of descriptors to emerge
For deconvoluting multiplexed biophysical assay data, the multiple binding constants will be modelled with Bayesian approaches to simultaneously describe the several chemical components, predicted binding affinities, and read-outs from an array of related compounds.
The von Delft group and PostEra Ltd have complementary expertise, and a collaboration is essential to achieving the project goals. The von Delft group has a track record in high throughput structural biology, robotic synthesis and biophysical assays. PostEra Ltd brings the machine learning for chemistry expertise to the project, specifically in synthesis-driven molecule design. Prof Deane (2nd supervisor) brings deep expertise in analysis of structural data. Finally, the chemical biology and discovery collaborations that form a large part of Prof von Delft’s research, will ensure immediate use of the algorithms, and provide the student with direct real-world experience and scenarios for hardening the approach.
[1] Lee et al., “Crowdsourcing drug discovery for pandemics”, Nature Chemistry, 12, 581 (2020)
[2] Baker et al., “Rapid optimisation of fragments and hits to lead compounds from screening of crude reaction mixtures”, Communications Chemistry, 3, 122 (2020)
[2] Schwaller et al., “Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction”, ACS Central Science, 5, 1572 (2019)
[3] Morris et al., “Discovery of SARS-CoV-2 main protease inhibitors using a synthesis-directed de novo design model”, Chemical Communications, 57, 5909 (2021)
Apply using course: DPhil in Clinical Medicine