Join Zoom Meeting
aarhusuniversity.zoom.us/j/63280023773
Meeting ID: 632 8002 3773
Many data: combining different study designs for better causal inference
Abstract:
Randomized controlled trials (RCTs) are the gold standard for causal inference and play a central role in modern evidence-based medicine. However, they are expensive to run, may fail to detect the sample sizes they use are often too limited to provide adequate power for drawing causal conclusions. In contrast, observational data are becoming increasingly accessible in large volumes, but can be subject to bias as a result of hidden confounding. This has prompted growing interest in data fusion methods integrating disparate data resources.
Much of this literature focuses on integrating a single lower quality dataset with an RCT, and this includes the ‘power likelihood’ method (Lin et al., 2025). This method works by raising the likelihood of the observational data to a power, which is estimated using a loss function. Unfortunately, this method is very computationally intensive, and does not scale to multiple datasets. Here we present the Multi-Dimensional Power Likelihood (MDPL), a generalization of the single dataset power likelihood method that allows us to combine a primary, high-quality study with any number of secondary datasets. This is a quasi-empirical Bayesian method which adapts to different datasets by optimizing an objective function to determine learning rates that effectively regulate information from each secondary source. We validate our approach against a range of other leading methods through extensive simulation studies. We demonstrate reduced error in average and conditional treatment effect estimates in multiple different contexts.
We will briefly mention a further generalization of this idea, using arbitrary loss functions rather than the negative log-likelihood. We illustrate the effectiveness of our methods through a real-world data fusion study, augmenting the PIONEER 6 clinical trial with a US health claims dataset.
Reference
Gruen, A. Lin, X., Tarp, J. M. and Evans, R. J. ManyData: A multiple dimensional power likelihood to combine observational and experimental data. Under review, 2025.
Lin, X., Tarp, J. M., and Evans, R. J. Combining experimental and observational data through a power likelihood, Biometrics, 81(1) ujaf008, 2025.
Biography
Robin Evans is a Professor of Statistical Science at the University of Oxford. His research interests include causal inference, multivariate and graphical models, latent variable models, and algebraic statistics. He is particularly interested in causal simulation, and in how evidence from different kinds of study might be combined. His previous work has been applied to systems biology, quantum information theory, and the social sciences.