There is a sweet spot in the interchange between active and passive user data. For instance, Flickr’s interestingness ranking pulls up pictures based on: “Where the clickthroughs are coming from; who comments on it and when; who marks it as a favorite; its tags and many more things which are constantly changing.” And the most ‘interesting’ photos often are very interesting, but can sometimes be captivating images that have little or nothing to do with the search term.
In February I wrote at FutureNow about tag spamming, where people on Flickr are adding every word from A to Z to get their photos to come up in more searches and thereby increase their ‘interestingness’. Here’s an example of a set of tags for a photo that comes up on the first results page ranked by interestingness: Action, Active, Activities, Activity, Adorable, Adult, Affection, Affectionately, Affectionate…Yellow, Young, Youngster, Youth. The combination of the active data from the misapplied tags and the passive data from people clicking on a captivating image leads to irrelevant results.
What happens as search engines start to draw on this mix of user contributed metadata and behavior patterns? I read on ReadWriteWeb that “Yahoo! is using Flickr Interestingness to mark out the best photos and then displays them via a shortcut in results on the main search vertical.” Moving into the territory of general search will add to the incentive to get one’s content viewed by more people.
Averaging out over a large set of people might be one solution, as in the ESP Game or the ‘common tags’ you get when you reverse look-up a URL at del.icio.us. You could use ESP Game results to discourage (in ranking) the use of what people deem to be irrelevant tags.
I realized after writing the title, that anchor text counts as metadata and has been spammed since the very beginning…