How much labelled training data do we need to predict TCR specificity against private neoantigens?

Reliably assigning TCRs, based on their sequence, to private (non-shared) neoantigen targets is a lighthouse enabling technology for the personalized TCR therapy development. Multiple publications aimed to do this using increasingly sophisticated deep learning models. All such efforts relied on approximately the same and limited set of publicly available labeled training data with experimentally established TCR-peptide-MHC (pMHC) reactivity pairs. In a follow-up benchmarking of such published methods, it has been shown that the apparently good prediction metrics were due to subtle information leaks between training and testing sets exacerbated by severe imbalances in the data distribution. In our work, we estimated the minimum amount of labeled training data required to achieve generalization of predictions for models which are trained directly on TCR and pMHC protein sequences. This estimate exceeds the currently available size of public data by a factor of 1000. We conclude that this prediction problem can be solved through either a major effort on generating more data or using models which have little or no labeled data requirements, such as the pre-trained foundation structure prediction models.

Date: 7 July 2025, 13:00
Venue: Online only
Speaker: Dr Andrey Tovchigrechko (BioNTech)
Organising department: MRC Weatherall Institute of Molecular Medicine
Organiser: Hashem Koohy (University of Oxford)
Organiser contact email address: immunoai@maillist.ox.ac.uk
Host: Dr Hashem Koohy (MRC Human Immunology Unit, University of Oxford)
Part of: Unravelling T Cell Recognition: Insights from Immunology and AI
Booking required?: Required
Booking url: https://events.teams.microsoft.com/event/872c8926-22a6-4475-850d-a07a518b4bb2@cc95de1b-97f5-4f93-b4ba-fe68b852cf91
Audience: Public
This talk features in the following public collections:
- Immunology, infection and inflammation seminars
- Oxford Cancer Immuno-Oncology Network
Editor: Sally Pelling-Deeves

OxTalks is Changing

How much labelled training data do we need to predict TCR specificity against private neoantigens?