External validation of clinical prediction models using big data from e-health records or IPD meta-analysis: opportunities and challenges

Clinical prediction models predict disease presence and outcome occurrence in individuals [1], thereby informing diagnosis and prognosis. After a model is developed, its robustness and generalisability should be verified in one or more external validation studies. External validation uses new participant-level data, external to that used for model development, to examine whether the model’s predictions are reliable in individuals from potential population(s) for clinical use. Unfortunately, there are relatively few external validation studies, which is often attributed to the lack of external data available immediately after model development. However, increasingly researchers have access to ‘big’ data as evident by meta-analyses using individual participant data (IPD) from multiple studies [2], and by analyses of databases and registry data containing e-health records for thousands or even millions of patients from multiple practices, hospitals, or countries. In this presentation, we illustrate why big data heralds an exciting opportunity to improve the uptake of external validation research [3]. In particular, it allows a model’s predictive performance (e.g. in terms of discrimination and calibration) and clinical utility (e.g. in terms of net-benefit) to be evaluated across different clinical settings, populations, and subgroups of intended use. Using real examples (including new evaluation of QRISK2) we show that simply reporting a model’s overall performance (averaged across all clusters and individuals) can mask deficiencies. Rather, meta-analysis techniques such as funnel plots and prediction intervals can be used to summarise the distribution of performance, and to identify settings in which a model is not adequate or requires recalibration (updating) before implementation.

1. Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, Riley RD, Hemingway H, Altman DG. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med 2013; 10: e1001381.
2. Debray TPA, Riley RD, Rovers MM, Reitsma JB, Moons KGM. Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use. PLoS Med 2015; 12: e1001886.
3. Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, Collins GS. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016; 353: i3140.

Richard Riley is a Professor of Biostatistics at Keele University, which he joined in October 2014. He previously held posts at the
Universities of Birmingham, Liverpool and Leicester. His current role focuses on statistical and methodological research for prognosis and meta-analysis, whilst supporting clinical projects in these areas. He is also a Statistics Editor for the BMJ and a co-convenor of the
Cochrane Prognosis Methods Group; co-leads a summer school in Prognosis Research Methods; and leads a number of statistical training courses for risk prediction and meta-analysis.