Classifying transcriptional and genetic heterogeneity in single-cell measurements

The ability to evaluate cell heterogeneity is particularly critical in the context of cancer therapy, where presence of phenotypically distinct subclonal populations fuels relapse and resistance to treatment. The transcriptional heterogeneity within such tumors and its impact on disease progression is poorly understood. Furthermore, the extent to which genetic and transcriptional subpopulations correspond to each other cannot be currently assessed. To investigate these questions we have developed methods for analysis of single-cell RNA-seq data in concert with other genomic information. To characterize transcriptional subpopulations we identify annotated or newly-discovered gene sets that are linked to statistically significant heterogeneity within the measured collection of cells. To infer genotype information we rely on probabilistic assessment of single nucleotide variants and copy number variation in individual cells, which can be used to distinguish genetically subclonal populations. We apply the developed approach to investigate the relationship between genetic and transcriptional heterogeneity within tumors of multiple myeloma patients, demonstrating significant transcriptional differences among subclones, including persistent transcriptional features distinguishing metastasis-associated clones in the context of a pre-metastatic primary tumor.