David Skillicorn’s book Understanding Complex Datasets presents a method for picking k: the number of meaningful components from a principal components analysis / SVD. I tried implementing the algorithm for a simulated dataset. I generate n observations from k components of differing size in p dimensions, then add noise.
The scree plot is a plot of the variance explained by each next principal component / reduced rank approximation of the original data. This is by definition a decreasing plot, and it is suggested to look for a flattening of the curve to indicate the point at which the remaining components are noise.
The method mentioned in Skillicorn is to look at the residual matrix after subtracting away a reduced rank approximation of the data. This matrix either still has some structure from remaining components or is just noise. If it is just noise, then swapping the signs of the entries should not change the 2-norm (the largest singular value) very much (I swapped the signs 10 times and took the average of all the 2-norms). The proposed measure of residual structure is the 2-norm of the residual matrix minus the 2-norm of the altered matrix over the Frobenius norm of the residual matrix. I then take the log of this, so the minimum is easier to see in a plot.
Here’s an easy one:
Here’s a plot of one instance, where the true number of components is 15, but with lots of noise added which overpowers the smaller components. One nice property is that it is fairly convex, because the Frobenius norm of the residual matrix in the denominator goes to zero.