Compositional data analysis: A fresh approach


This talk will be held both in person and online.

The analysis of composition is a highly active area of applied multivariate statistics, in fields that span physical and biological science, the social sciences and humanities. The composition of a multi-part entity is a fixed-sum vector representing the relative amounts of each part in the whole entity. Just a few examples are: in electoral politics, the vote shares for different political parties; in geology, the percentages of different minerals in rock samples; in sociology and other behavioural sciences, the “time budgets” of individuals, e.g., fractions of the day that are spent sleeping, working, sedentary, physically active, etc.; in biology, the relative prevalence of different microbes in the human gut.

The statistical literature in this area is dominated by work done in the 1980s by John Aitchison, including the highly influential 1986 book “The Statistical Analysis of Compositional Data”. The essence of Aitchison’s approach is to transform composition vectors to a set of contrasts among logarithms of the data, and then work with the standard tools of multivariate statistics such as multivariate normal distributions and linear models. In this talk I challenge that approach, in terms of both its foundational principles and some well-known practical difficulties. A more flexible approach, based on suitably targeted statistical models rather than on data-transformation, is advocated.