I’ve been experimenting with the NumPy Python package recently, which has fast and intuitive operations on arrays. I tried implementing the “nearest shrunken centroid method” of Tibshirani, Hastie, Narasimhan and Chu (2002) from the Predictive Analysis for Microarrays R package. The shrunken centroids classifier is a method for dealing with large numbers of noisy features. It is similar in some respects to penalized regression, in winnowing down to a subset of useful features.

In the case of gene expression data, the algorithm calculates class centroids, then shrinks each gene of the class centroids towards the overall centroid by a certain threshold. This step helps identify the smallest subset of genes that still gives predictive accuracy (using cross-validation). The link above has a good description and the original paper. Some graphs of the output using Matplotlib:

### Like this:

Like Loading...

*Related*

Taken out of context, this post has a very absurdist title.