Graph Analytics of Clinical Multi-omics data

Development of omics technologies such as genomics, proteomics or metabolomics, has notably supported the analysis and interpretation of biological systems, especially in the context of disease. These technologies provide high-resolution and high-quality data, which allows a more comprehensive and holistic view of such systems. However, the complexity of diseases and biological processes has proven that often a single omics dimension is not sufficient to understand the underlying mechanisms of these systems. This, together with the generation of larger and larger omics datasets and the collection of knowledge around them scattered across different resources opens new challenges in the analysis, integration and interpretation of these data.

To aid in these challenges, graph-based approaches are an ideal method to systematically integrate and consolidate all available omics types and all the annotations around them, providing an exhaustive biological framework to investigate large-scale multi-omics data. One of the strengths of graphs is that they allow focusing on data points or nodes not as individual entities but as related components, which helps unfold hidden patterns and emergent behaviours. To this end, graphs provide an underlying mathematical framework that models and interrogates such structures.

Following this approach, we propose a system that integrates multi-omics data and information from a variety of relevant biomedical databases into a Knowledge Graph. To illustrate, in our system an identified protein in a proteomics experiment also encompasses all its related components (other proteins, diseases, drugs, etc.) and their relationships. Thus, our graph facilitates the interpretation of data and the inference of meaning by providing relevant biological context. Further, all data are collected and harmonized in a single platform facilitating data analysis. The standardized structure of the graph makes all the information rapidly accessible and opens the possibility to automate information retrieval, analytics and global correlations. Ultimately, this could accelerate integration and interpretation of experimental omics data, and in the clinical context translate into actionable results.