Human Assisted Search February 25, 2007Posted by Andre Vellino in Search, Statistical Semantics.
I tried the human-powered “search with guide” feature on the ChaCha search engine the other day. I can’t see human-guided search becoming a business success story in the mass market – for most purposes searching is becoming suffiently easy that we don’t need help any more.
But the idea of having a human guide to help with sophisticated searches (which has been floating around in on-line libraries for a while) may work well in a scientific digital library where the help of a trained librarian or subject specialist could really be welcome. I’m optimistic that this kind of service will be offered in on-line science libraries both because my experience at ChaCha was quite good and because I believe there are situations where there aren’t likely to be automated alternatives.
When I used ChaCha’s “Search with Guide” feature, my browser entered me into a chat session with a human search-assistant. Our chat conversation helped her (she called herself “Kimberly”) narrow down the general “Global Warming” query that I originally gave her. What I really wanted to know was “what can we do about it”? She gave me about 4 answers that were displayed to me one at a time at the rate of about 1 per 30 seconds, all very relevant.
Now, I could have come up with the first answer myself with a query for “Global Warming Solutions” on Google or MS Live Search, so Kimberly wasn’t especially useful to me with this particular query. But you can imagine situations where only a knowledgeable human being can come up with synonyms or semantically cognate phrases.
Consider for example, the problem of searching for the recent paper that solves the Poincare Conjecture. If you don’t happen to know that Fields Medal nominee Grigori Perelman (I say nominee because he famously declined to accept it) solved this Millenium Problem, then you may have some trouble finding his original paper just by searching for “Poincare Conjecture” – especially in a science archive like arXiv.org which has no references to “journalistic” articles like Wikipedia. The reason is that Perelman’s paper makes no mention of the Poincare Conjecture – this result merely follows from his solution to Thurston’s more general Geometrization Conjecture using extensions to Richard Hamilton’s theory of Ricci Flow (all of which, incidentally, are completely beyond my comprehension.)
I think this kind of knowledge still requires a human brain, because statistical semantics just doesn’t have high enough occurrences of word-frequency patterns in a large enough corpus to induce this information. Furthermore, there is, I think, a historical component to this kind of knowledge (first X happened, then Y, etc.) which I don’t think statistical frequency patterns can reflect easily.