PCA in population genetics

This is a great Nature paper from 2008 that a labmate Owen showed me. The punchline is that you have to be careful when interpreting the results from principal component analysis:

Interpreting principal component analyses of spatial population genetic variation
Nature Genetics 40, 646 – 649 (2008)
John Novembre & Matthew Stephens

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events.

Because the basis for these interpretive guidelines is unclear, we performed simulations to investigate whether such specific migration events are necessary to explain the observed patterns. Specifically, we performed PCA on data simulated under equilibrium population genetic models without range expansions, assuming a constant homogeneous short-range migration process across both time and (two-dimensional) space. The results showed highly distinctive structure. For example, the first two PC maps show large-scale orthogonal gradients, and the next two show ‘saddle’ and ‘mound’ patterns.


1 thought on “PCA in population genetics”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s