A Hitchhikers Guide to Big Data Based Biomarkers

Abstract: Researchers use “big molecular and imaging data” together with machine learning (ML) algorithms to build predictive models for specific diseases or phenotypes. Unfortunately, most of these models show deteriorating performance when tested on unseen data, often due to over-fitting. A possible mitigation of this pervasive problem is directly embedding prior biological knowledge – in the form of interacting gene/features pairs and networks and other ways as well – into the decision rules to build robust predictive models with reduced over-fitting, leading to more consistent and robust predictive signatures. Furthermore, this approach also enhances the translational value of the derived classifiers by hypothesizing causal explanations for the disease phenotypes. In conclusion, embedding biological mechanisms into statistical learning holds the promise to move the field towards a successful transition to personalised health care.

Bio: Luigi Marchionni, M.D., Ph.D. is Associate Professor and Vice-Chair for Computational and Systems Pathology at Weill Cornell Medicine. Prof. Marchionni works in close collaboration with “wet lab” researchers, uncovering molecular contributions to interesting cancer phenotypes. His current research focuses on knowledge integration across different “omics” and imaging data types, the development of novel prediction algorithms for cancer prognostication and therapy selection, and the integration of “omics-based” predictors into current cancer patients’ clinical management.