MultiMAP: dimensionality reduction of multiple datasets by manifold approximation and projection

Multi-modal data sets are growing rapidly in single cell genomics, as well as other fields in science and engineering. We introduce MultiMAP, an approach for dimensionality reduction and integration of multiple datasets. MultiMAP embeds multiple datasets into a shared space so as to preserve both the manifold structure of each dataset independently, in addition to the manifold structure in shared feature spaces. MultiMAP is based on the rich mathematical foundation of UMAP, generalizing it to the setting of more than one data manifold. MultiMAP can be used for visualization of multiple datasets as well as an integration approach that enables subsequent joint analyses. Compared to other integration for single cell data, MultiMAP is not restricted to a linear transformation, is extremely fast, and is able to leverage features that may not be present in all datasets. We apply MultiMAP to the integration of a variety of single-cell transcriptomics, chromatin accessibility, methylation, and spatial data, and show that it outperforms current approaches in run time, label transfer, and label consistency. On a newly generated single cell ATAC-seq and RNA-seq dataset of the human thymus, we use MultiMAP to integrate cells across pseudotime. This enables the study of chromatin accessibility and TF binding over the course of T cell differentiation.