An Isaac Newton Institute Workshop

Recent Advances in Statistical Genetics and Bioinformatics

Population structure and eigenanalysis

14th December 2006

Author: Nick Patterson (Broad Institute)

Abstract

When analyzing genetic data, one often wishes to determine if the samples are from a population that has structure. Can the samples be regarded as randomly chosen from a homogeneous population, or does the data imply that the population is not genetically homogeneous? We show that an old method (principal components) together with modern statistics (Tracy-Widom theory) can be combined to yield a fast and effective answer to this question. The technique is simple and practical on the largest datasets, and can be applied both to genetic markers that are biallelic or to markers that are highly polymorphic such as microsatellites. The theory also allows us to estimate the data size needed to detect structure if our samples are in fact from two populations that have a given, but small level of differentiation.