Building FAIR Machine Learning-Ready Data on MHC-Ligand Interactions: Insights from Structural Data and ML Predictions

Major Histocompatibility Complex (MHC molecules) are a cornerstone of the adaptive immune system. They present antigens to T-cells for surveillance.

While we have a reasonably large number of experimentally determined three-dimensional structures of MHC molecules, there is low diversity in the dataset and it represents a very small fraction of the universe of potential complexes.

Despite this caveat, we can discern patterns within antigen presentation/recognition in the bulk data. If we benchmark these methods robustly, we can potentially extend using synthetic data from machine learning prediction methods such as AlphaFold.