Evolutionary Encoding and the Ancestry of Everyone

Inferring the evolutionary history of the genome is a fundamental problem in evolutionary biology. However, for sexual species, the genomic history, or ancestry, of individuals in a population is confounded by the fact that different regions of the genome have different histories. We have developed a technique (“tsinfer”) for inferring evolutionary trees from genetic variants at every point in the genome. The method scales to millions of individuals, providing comparable accuracy to full likelihood methods such as ARGweaver, and even outperforming them in cases such as selective sweeps. The method results in an “evolutionary encoding” for genetic variation data, allowing us to store genomes in a succinct format, suitable for rapid, genome-wide evolutionary analyses.
In this talk I will briefly outline our evolutionary encoding technique and inference methodology, then present the patterns of deep ancestry revealed from the 1000 Genomes and the Simons Genome Diversity projects, as well as showing results from extending the analysis to include the million genomes from the UK Biobank. I will discuss current limitations of our approach, and our current focus on extending our ancestral inference to historical patterns in space and time.