Archive for the 'math' Category

Measure of structure in residual matrix

September 14, 2009

David Skillicorn’s book Understanding Complex Datasets presents a method for picking k: the number of meaningful components from a principal components analysis / SVD.  I tried implementing the algorithm for a simulated dataset.  I generate n observations from k components of differing size in p dimensions, then add noise.

The scree plot is a plot of the variance explained by each next principal component / reduced rank approximation of the original data.  This is by definition a decreasing plot, and it is suggested to look for a flattening of the curve to indicate the point at which the remaining components are noise.

The method mentioned in Skillicorn is to look at the residual matrix after subtracting away a reduced rank approximation of the data.  This matrix either still has some structure from remaining components or is just noise.  If it is just noise, then swapping the signs of the entries should not change the 2-norm (the largest singular value) very much (I swapped the signs 10 times and took the average of all the 2-norms).  The proposed measure of residual structure is the 2-norm of the residual matrix minus the 2-norm of the altered matrix over the Frobenius norm of the residual matrix.  I then take the log of this, so the minimum is easier to see in a plot.

Here is the R code.

Here’s an easy one:

svd.structure2

Here’s a plot of one instance, where the true number of components is 15, but with lots of noise added which overpowers the smaller components.  One nice property is that it is fairly convex, because the Frobenius norm of the residual matrix in the denominator goes to zero.

svd.structure

N-ball

July 9, 2008

This is a bizarre feature of geometery that I forgot about: the volume of a the unit n-dimensional ball tends to zero as n increases past 5.  A unit n-ball (or hypersphere) is the set of points that can be reached by going 1 unit in any direction.  I think this is really counter-intuitive.

Image from Wolfram on ball.

Wikipedia on the curse of dimensionality says:

Thus, in some sense, nearly all of the high-dimensional space is “far away” from the centre, or, to put it another way, the high-dimensional unit space can be said to consist almost entirely of the “corners” of the hypercube, with almost no “middle”. (This is an important intuition for understanding the chi-squared distribution.)

Prime spirals, Hahn and Sacks

June 28, 2008

This arxiv PDF has to win the insane diagram award for 2008.  Robert Sacks of the Sacks prime spiral and Harry Hahn point out some curiosities in the distribution of primes.  Lots of triple question marks:

Smale’s paradox topology video

December 23, 2007

smale.png

This is a pretty amazing video on Smale’s paradox, which says that it is possible to turn a sphere inside-out without holes or creases if you allow for self-intersection. The question and answer gets tiring, but I think they explain a lot of topology concepts well.

Two open source packages: math and 2d physics

December 9, 2007

If you read slashdot, digg or reddit, these may already be familiar…

tower.png

Chipmunk is a 2d physics library written in C, designed for video games. Has some videos at the website. The creator credits Box2D for inspiration in the speed of the software. Wondering what they did that speed things up (when I was in 8th grade I remember spending lots of time playing with Interactive Physics), I looked up the Box2D user manual:

Box2D tends to allocate a large number of small objects (around 50-300 bytes). Using the system heap through malloc or new for small objects is inefficient and can cause fragmentation. Many of these small objects may have a short life span, such as contacts, but can persist for several time steps. So we need an allocator that can efficiently provide heap memory for these objects.

Box2D’s solution is to use a small block allocator (SBA). The SBA keeps a number of growable pools of varying sizes. When a request is made for memory, the SBA returns a block of memory that best fits the requested size. When a block is freed, it is returned to the pool. Both of these operations are fast and cause little heap traffic.

SAGE is mathematics software, aimed to be an alternative to Magma, Maple, Mathematica, and MATLAB, in which you use Python.

Lie algebras

November 21, 2007

This guy, sigfpe, writes a really simple description of Lie algebras, which I never got to in school.   All on the way to figuring out how E8 relates to the universe, and it’s subgroup, surfing.

Slope One Predictors for really simple collaborative filtering

October 16, 2007

Daniel Lemire and Anna Maclachlan have a simple method for guessing how a person will rate a thing based on collaborative filtering.  It’s named Slope One – the predictor is linear with no coefficient. Slope one’s advantage is that it is fast and easy to implement – some simple linear algebra to compute a particular unknown rating.

We propose three related slope one schemes with predictors of the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both.

…In a pairwise fashion, we determine how much better one item is liked than another. One way to measure this differential is simply to subtract the average rating of the two items. In turn, this difference can be used to predict another user’s rating of one of those items, given their rating of the other.

And then you can do a weighted Slope One to try to get even closer. From the Wikipedia article,

If a user rated several items, the predictions are simply combined using a weighted average where a good choice for the weight is the number of users having rated both items.

slopeone2.png

Given the shown ratings, we begin by observing that item 1 is rated, on average, 3 points less than item 2 or item 3. Moreover, items 1 and 2 have been corated by a single user, whereas items 1 and 3 have been corated by 2 users. From this information, and the ratings that Lucy provided for items 1 and 2, we can provide a prediction.

They experimented with 2 sets of movie data using All But One Mean Average Error and compare to other 4 other rating schemes (bias from mean, adjusted cosine item-based, etc). It looks like the range for ratings was 0-1. The Netflix prize range is 1-5 aiming for predictions with .856 RMSE, which roughly would be .21 RMSE over 0-1. I have no idea how to compare these different reports of error beyond normalizing though. It can’t be doing better than the Netflix prize of course.

I haven’t seen anyone else test this comparison or anybody comparing it to other methods they mention in the paper – Bayes, Latent Class, clustering…

slopeone.png

The paper at Lemire’s site or arXiv.

Article, examples and some demos in different languages at Wikipedia.

I saw that someone built a slope one module for Drupal in case you were thinking about creating an out-of-my-garage Amazon or Netflix.