Big Data Ethics Forum: Fake news? Epistemic and ethical challenges for synthetic datasets
Synthetic data is data that are generated by an algorithm to have properties similar to real data. They may be useful whenever real data is too sensitive, too valuable, or too limited to meet research needs. Synthetic data may be used, for example, to teach population health science without releasing real patient data to students, or to develop statistical analyses while avoiding Gelman and Loken’s “garden of forking paths”. But the more closely synthetic data replicate the properties and patterns of real data, i.e. the more realistic they are, the greater the risk that they fail to achieve some of these objectives. Information contained in synthetic data could be used to learn about the real data on which they are based, potentially risking participant privacy or exhausting the utility of the real data for hypothesis testing. Conversely, attempts to reduce bias in certain machine learning models by augmenting real data with synthetic data may be defeated by a lack of realism. The ethical and epistemic problems that might motivate use of synthetic data, and the issues which may affect how it is generated and used, will be presented for discussion.
Date: 4 February 2020, 14:15 (Tuesday, 3rd week, Hilary 2020)
Venue: Big Data Institute (NDM), Old Road Campus OX3 7LF
Venue Details: Seminar Room 0
Speakers: Ben Cairns (University of Oxford), Angeliki Kerasidou (University of Oxford)
Organising department: Ethox Centre
Organiser: Christa Henrichs (Wellcome Centre for Ethics and Humanities)
Part of: Ethox Centre Seminars
Booking required?: Not required
Audience: Members of the University only
Editors: Graham Bagley, Hannah Freeman