The importance of family-based sampling for biobanks

Large-scale population-based samples of genotyped individuals have transformed our understanding of the genetic and environmental causes of health and disease. A major advantage of molecular genetic data over other observational designs is its use for causal inference by exploiting the random transmission of genetic variants from parents to offspring as a natural experiment—an approach known as Mendelian randomization.

Most molecular genetic studies have relied on samples of unrelated individuals, with relatedness viewed primarily as a technical complication and potential source of bias. Mendelian randomization studies in these samples implicitly assume that the random allocation of variants within families also holds at the population level. However, a growing body of evidence from genotyped siblings and parent–offspring trios suggests this assumption is often violated, with population-based estimates susceptible to bias from population structure, assortative mating, and dynastic effects. These problems are compounded when using diverse samples, and larger samples of unrelated individuals will only yield more precise, but equally biased, estimates.

In this talk, I will first summarise evidence from family-based studies demonstrating the magnitude and pervasiveness of these biases. I will then argue that expanding the collection and use of family-based molecular genetic data is essential for improving causal inference in genetics. Finally, I will discuss the strengths and limitations of specific family-based designs, and what they can—and cannot—tell us about the causes of health and disease.

Join the meeting online shorturl.fm/Yb0nG