Veridical Data Science for biomedical discovery: detecting epistatic interactions with epiTree

OxTalks is Changing

OxTalks will soon be transitioning to Oxford Events (full details are available on the Staff Gateway). A two-week publishing freeze is expected to start before the end of Hilary Term to allow all future events to be migrated to the new platform. During this period, you will not be able to submit or edit events on OxTalks. The exact freeze dates will be confirmed on the Staff Gateway and via email to identified OxTalks users.

If you have any questions, please contact halo@digital.ox.ac.uk

Veridical Data Science for biomedical discovery: detecting epistatic interactions with epiTree

Please note this event will be hosted on Zoom.

“A.I. is like nuclear energy — both promising and dangerous” — Bill Gates, 2019.

Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls are ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the “dangers” of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics – bringing us a step forward towards veridical Data Science.
In this lecture, we will illustrate the PCS framework through the epiTree; a pipeline to discover epistasis interactions from genomics data. epiTree addresses issues of scaling of penetrance through decision trees, significance calling through PCS p-values, and combinatorial search over interactions through iterative random forests (which is a special case of PCS). Using UK Biobank data, we validate the epiTree pipeline through an application to the red-hair phenotype, where several genes are known to display epistatic interactions.

Date: 18 February 2021, 15:30
Venue: Venue to be announced
Speaker: Professor Bin Yu (UC Berkeley)
Organising department: Department of Statistics
Organisers: Beverley Lane (Department of Statistics, University of Oxford), Dr Robin Evans (Department of Statistics, University of Oxford)
Organiser contact email address: events@stats.ox.ac.uk
Hosts: Professor Yee Whye Teh (Department of Statistics, University of Oxford), Dr Robin Evans (Department of Statistics, University of Oxford)
Part of: Distinguished Speaker Seminar
Booking required?: Required
Booking url: https://www.stats.ox.ac.uk/events/distinguished-speaker-seminar-thursday-18th-february-2021/
Audience: Members of the University only
This talk features in the following public collections:
- Talks of Interest to Medical Sciences
Editor: Beverley Lane