Random partitions of finite sets play a key role in modelling genetic diversity. The basic problem is to draw statistical inference about the general population where a sample partition on species is only observable. Mathematical models are greatly simplified by assuming that the population itself is a sample from an idealized infinite population, due to Kingman’s theory of exchangeable random partitions of countable sets, whereby partitions are modelled by sampling from a random discrete distribution. In population genetics, the sample values may carry additional characteristics of the species. For example, in Moran’s model with infinitely many alleles, such a characteristic encodes the relative age of species, and the question of interest is, given the observed frequencies of species in the sample, to order them by age. Donnelly & Tavaré (1986) proved that in the GEM model (which leads to the famous Ewens sampling formula), the distribution of the order by age is the same as that of the order by appearance. In my talk, I will show that in a two-parametric generalization of the GEM model, and more generally, under the so-called Gibbs sampling, these two orders have different distributions which are nevertheless connected via a modification of the stochastic procedure known as size-biased ordering.
This is joint work with Jim Pitman (Berkeley), doi:10.1214/17-EJP59; doi:10.1214/17-ECP95.