Accurate proteome-wide missense variant effect prediction with AlphaMissense

The vast majority of missense variants observed in the human genome are of unknown clinical significance. Machine learning approaches could close this variant interpretation gap by exploiting patterns in biological data to predict the pathogenicity of unannotated variants. I will discuss AlphaMissense, which combines advances of the highly-accurate structure prediction model, AlphaFold, and population variant data to predict missense variant pathogenicity. We demonstrate state-of-the-art predictions on clinically-asscertained labels and experimental benchmarks, without explicitly training on such data. Due to higher predictive performance, the fraction of ClinVar test variants that we can confidently classify with 90% precision has increased by 25.8 percentage points (from 67.1% to 92.9%) compared to the recent well-performing unsupervised model EVE. I will also cover aspects of model evaluation, interpretation and utility. For instance, we find that gene level AlphaMissense scores are predictive of genes essential to cell survival, and this property holds amongst the ~22% of smaller genes, which methods based only on population cohort data lack statistical power to detect reliably.

Speaker
Dr. Clare Bycroft is a research scientist at Google DeepMind with a background in human genetics. She has a particular focus on ensuring the utility of deep learning models in real-world settings. Previously, Clare worked with Genomics PLC, an Oxford-based biotech using human genetics data to propose new therapeutic targets; and during her DPhil (Welcome Centre for Human Genetics) curated the first tranche of the UK Biobank genotyping data.