PCA in population genetics
This is a great Nature paper from 2008 that a labmate Owen showed me. The punchline is that you have to be careful when interpreting the results from principal component analysis:
Interpreting principal component analyses of spatial population genetic variation
Nature Genetics 40, 646 – 649 (2008)
John Novembre & Matthew Stephens
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events.
Because the basis for these interpretive guidelines is unclear, we performed simulations to investigate whether such specific migration events are necessary to explain the observed patterns. Specifically, we performed PCA on data simulated under equilibrium population genetic models without range expansions, assuming a constant homogeneous short-range migration process across both time and (two-dimensional) space. The results showed highly distinctive structure. For example, the first two PC maps show large-scale orthogonal gradients, and the next two show ‘saddle’ and ‘mound’ patterns.