Accelerating medicinal chemistry with hypothesis-driven machine learning

The ongoing efforts in COVID antiviral discovery is a stark reminder that small molecule drug discovery is still painfully slow. This is partly because the medicinal chemistry optimisation cycle – designing molecules, synthesising molecules, and feeding data from biological assays into the next round of designs – is still empirically driven. In my talk, I will discuss our progress towards using hypothesis-driven machine learning to close the design-make-test cycle: predicting molecular properties, designing optimised molecules and ensuring the designed molecules are rapidly synthesizable. I will show how physical and chemical understanding can be incorporated into machine learning, enabling data-driven methods to be useful in the low-data limit that most drug discovery campaigns operate in. I will illustrate our approach using examples from COVID Moonshot, an open science drug discovery project that aims to discover oral SARS-CoV-2 main protease inhibitors.