I went to this conference on search at Berkeley I heard about from Mike Migurski, and saw a great panel of people from companies and academics talking about natural language search.
I noticed a certain kind of question and attitude from the audience which was unfortunate. Roger Magoulas summed it up as a last mile problem – where they expected something touting natural language search to understand queries, successfully relate those to text from documents, but also verify the accuracy of the documents as well. That last one is big stretch and something no one has yet to suppose Google or Yahoo is responsible for. For instance, one person was upset that the TextRunner demo brought up responses claiming Edison invented the lightbulb.
The majority of the panelists were able to explain why this is not necessarily a problem for people building search engines (why representing the spectrum of differing texts is the best option), but the question about credibility of information was asked multiple times. It’s clear that people’s expectations for context aware and natural language technologies are going to be high – but it’s unfortunate if people aren’t willing to experiment in these areas before we’ve got ultimate answers.
Stuart Russell from UCB had a way of looking at the natural language problem that was almost a caricature of the theoretician: “the world is a random variable that has some possible values => how does the world generate the web.” Barney Pell of Powerset and Oren Etzioni of the Turing Center had a approach that was more focused on building up from the text and language side first rather than building an accurate representation of all knowledge of the world.
One good question at the end was: how can you represent a minority report in your results – as in “90% of documents talk about Edison inventing the light bulb but 10% of credible sources say Tesla”.