(Not) Aggregating Data

This Lecture will hosted on Zoom. Please complete the short registration form on our website to receive the joining instructions.

The ability to generate, access and combine multiple sources of data presents both opportunity and challenge for statistical science. An exemplar phenomenon is the charge to collate all relevant data for the purposes of comprehensive control and analysis. However, this ambition is often thwarted by the relentless expansion in volume of data, as well as issues of data provenance, privacy and governance. Alternatives to creating ‘the one database to rule them all’ are emerging. An appealing approach is the concept of federated learning, also known as distributed analysis, which aims to analyse disparate datasets in situ. In this presentation, I will discuss some case studies that have motivated our interest in federated learning, review the statistical and computational issues involved in the development of such an approach, and outline our recent efforts to understand and implement a federated learning model in the context of the Australian Cancer Atlas.