Centroids

I’ve been experimenting with the NumPy Python package recently, which has fast and intuitive operations on arrays. I tried implementing the “nearest shrunken centroid method” of Tibshirani, Hastie, Narasimhan and Chu (2002) from the Predictive Analysis for Microarrays R package. The shrunken centroids classifier is a method for dealing with large numbers of noisy features. It is similar in some respects to penalized regression, in winnowing down to a subset of useful features.

In the case of gene expression data, the algorithm calculates class centroids, then shrinks each gene of the class centroids towards the overall centroid by a certain threshold. This step helps identify the smallest subset of genes that still gives predictive accuracy (using cross-validation).  The link above has a good description and the original paper. Some graphs of the output using Matplotlib:

shrunken4

shrunken5

Advertisements

1 thought on “Centroids”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s