Improving Heart Failure Risk Prediction Using Electronic Health Records and Deep Learning

Abstract:
Primary prevention of heart failure (HF) is an increasing clinical priority, driven by the rising burden of non-ischaemic cardiovascular disease and the availability of effective preventive therapies. The Predicting Risk of Cardiovascular Disease Events HF equation (PREVENT-HF) is the first guideline-endorsed tool for HF risk prediction, but it lacks large-scale external validation. We evaluated PREVENT-HF in more than 10 million individuals from two UK cohorts: the Clinical Practice Research Datalink (CPRD), a nationally representative primary care population, and UK Biobank (UKB). We additionally developed TRisk-HF, a Transformer-based survival model using longitudinal electronic health records (EHRs), to identify additional prognostic signals for HF risk in routine care data. In CPRD, PREVENT-HF demonstrated strong discrimination (concordance index [C-index] 0.840, 95% CI 0.837–0.843) and good calibration (slope 1.06, 1.05–1.07). TRisk-HF achieved higher discrimination (C-index 0.868, 0.866–0.871). Explainability analyses highlighted atrial fibrillation, chronic obstructive pulmonary disease, and alcohol use disorders as important contributors to HF risk. Incorporating these variables into PREVENT-HF improved performance (C-index 0.850, 0.847–0.853), substantially narrowing the gap with the Transformer model. Results were directionally consistent in UKB. These findings establish PREVENT-HF as a robust foundation for HF risk prediction and demonstrate a pragmatic, data-driven approach to refining clinical risk equations using information already embedded in longitudinal EHRs.

Short Bio: 
Zhengxian Fan is a DPhil student working at the intersection of artificial intelligence and clinical medicine, using large-scale electronic health record data.