Reliably assigning TCRs, based on their sequence, to private (non-shared) neoantigen targets is a lighthouse enabling technology for the personalized TCR therapy development. Multiple publications aimed to do this using increasingly sophisticated deep learning models. All such efforts relied on approximately the same and limited set of publicly available labeled training data with experimentally established TCR-peptide-MHC (pMHC) reactivity pairs. In a follow-up benchmarking of such published methods, it has been shown that the apparently good prediction metrics were due to subtle information leaks between training and testing sets exacerbated by severe imbalances in the data distribution. In our work, we estimated the minimum amount of labeled training data required to achieve generalization of predictions for models which are trained directly on TCR and pMHC protein sequences. This estimate exceeds the currently available size of public data by a factor of 1000. We conclude that this prediction problem can be solved through either a major effort on generating more data or using models which have little or no labeled data requirements, such as the pre-trained foundation structure prediction models.