Computational Workflows for Molecular Analytics: Integrating and Contextualizing Omics Data

Reproducible and high-throughput data analyses in the omics domains require combination of software tools in automated workflows. A complete workflow captures the experimental design and guides the analysis from raw data all the way to final statistical analysis and visualization. In mass spectrometry based proteomics, a typical workflow may contain steps such as format conversion, retention time alignment, calibration, feature extraction, peptide identification, validation, protein inference, peptide and protein quantitation, enrichment analysis and projecting the results on biological networks. The data can be integrated with existing datasets in public repositories and contextualized by mining the biomedical literature. This talk will focus on the software tools performing these operations, how to find those fit-for-purpose, assemble them into workflows, and benchmark the individual tools and workflows. Applications include integration of genome-wide SNP, RNA-Seq and proteomics data, optimizing targeted mass spectrometry assays, identification of biological species and tissues, and visualizations based on machine learning that provide novel insights into proteomics or metabolomics data.