BDI Seminar - Three principles of data science: Predictability, computability, and stability (PCS)

In this talk, I’d like to discuss the importance and connections of three principles of data science in the title and introduce the PCS workflow for the data science life cycle.PCS will be demonstrated in the context of two collaborative projects in neuroscience and genomics, respectively. The first project in neuroscience uses transfer learning to integrate fitted convolutional neural networks (CNNs)on ImageNet with regression methods to provide predictive and stable characterizations of neurons from the challenging primary visual cortex V4. Our DeepTune characterization provides a rich description of the diverse V4 selection patterns. The second project proposes iterative random forests (iRF) as stabilized Random Forests (RF) to seek predictable and interpretable high-order interactions among biomolecules. For an enhancer status prediction problem for Drosophila based on high-throughput data, iRF was able to find 20 stable gene-gene interactions, of which 80% had been physically verified in the literature in the past few decades. Last but not least, the data results from both projects provide experimentally testable hypotheses and hence PCS can also serve as a scientific recommendation system for follow-up experiments.