A music recommendation paper posted on Paul Lamere’s blog: Justin Donaldson’s Music Recommendation Mapping project with Mystrands data and the paper, “ZMDS: Visualizing Structural Entropy in Queried Network Sub-Graphs” PDF, poster
I’m slowly understanding how a lot of data mining techniques are related, although not always equal. Donaldson’s method for multidimensional scaling optimizes an energy (or stress) model in a similar way to the 2D graph clustering models I’ve posted – Noack’s linlog, Lehmann and Kottler – in order to highlight internally-connected clusters. Also his method highlights areas of “negative structural entropy” similar to how you might use Exploratory Projection Pursuit to find a projection as non-Gaussian as possible.
In this project, nodes are songs connected to each other based on user’s playlists (like Last.fm similarity). Donaldson’s approach is to query the graph at different points and compare a node’s connectedness in a neighborhood to the node’s connectedness in the whole graph. A neighborhood is constructed by snowball sample method, expanding from the sampled node along edges. Nodes that are well-connected in the whole graph, hub nodes, are high entropy; they appear quickly no matter where you sample and you’d like minimize these to favor other neighborhood structures. Very briefly, he takes an adjacency matrix from a neighborhood, adds a diagonal matrix with the nodes’ global degrees, weights each row by dividing by the global degree, and then applies a MDS method to reduce to 2 dimensions.
We already use snowball sampling
This querying method is interesting because people already have an intuitive sense of the world through networks (if not, The Tipping Point hammered it in) and I think intuitive methods encourage use. We do this in our analog life when we invite people to parties or ask for good books or music. It also seems efficient in terms of computation and data transfer: graph sampling methods help define local structure and make recommendations without exploring an Internet’s worth of related data. For instance, a restaurant recommendation engine might use snowball sampling to optimize over a number of graphs: one’s friends’ preferences, restaurant similarity based on menus, and subway distance as opposed to as-the-bird-flies. The recommendation tangent is inspired by this map of NY restaurants based on subway stop: