Unbiased computations for MCMC-based inference of Gaussian process covariance parameters

Probabilistic kernel machines based on Gaussian Processes (GPs) are popular in several applied domains due to their flexible modelling capabilities and interpretability. In applications where quantification of uncertainty is of primary interest, it is necessary to accurately characterise the posterior distribution over GP covariance parameters.

Employing standard inference methods would require repeatedly calculating the marginal likelihood. The formidable computational challenge associated with this is that the marginal likelihood is only computable in the case of GP models with Gaussian likelihoods applied to datasets with a limited number of input vectors (a few thousand). For large datasets, or for GP models with non-Gaussian likelihoods, it is not possible to compute the marginal likelihood exactly, and this has motivated the research community to develop a variety of approximations techniques. Even though such approximations make it possible to recover computational tractability, it is not possible to determine to which extent they affect the characterisation of the posterior distribution over GP covariance parameters.

In this talk, I will present the work I carried out over the past few years in the direction of developing Markov chain Monte Carlo (MCMC)-based inference methods for GP models that do not require the exact calculation of the marginal likelihood, but yield samples from the correct posterior distribution over covariance parameters. These “noisy” MCMC methods rely only on either unbiased estimates of the marginal likelihood or stochastic gradients (unbiased estimates of the gradient of the logarithm of the marginal likelihood). I will illustrate ways of obtaining these estimates and demonstrate how they contribute to the development of practical and scalable MCMC methods to carry out inference of GP covariance parameters. Finally, I will demonstrate the effectiveness of these MCMC approaches on several benchmark data and on a multiple-class multiple-kernel classification problem with neuroimaging data.