Baynote CF for IngentaConnect December 1, 2007Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender, Recommender service.
It’s validating to see that there are commercial offerings for recommender systems in digital libraries. My colleague Richard Akerman, blogger extraordinare for all things digital library, pointed me to this announcement by IngentaConnect declaring that they are using Baynote to provide collaborative filtering recommendations for scholarly publications. The announcement reads:
… a new partnership with content guidance pioneer Baynote will see IngentaConnect providing “more like this” article recommendations based on both the current context but also, more unusually in scholarly publishing, the user’s previous behaviour. Articles reviewed or acquired by users with similar interests and behaviour will be recommended for consideration and potential purchase.
The “more about how it works” section reads:
… context and behaviour are combined to determine the user’s intent, which is then analysed for relevance to that of the site’s other users; patterns that emerge from this analysis are used to recommend additional content which is more likely to be of interest and relevance to the user than regular, contextual recommendations. Sophisticated behavioural analysis monitors not simply clicks and page views, but also the length of time that a user spends on the page and the type of activities that they carry out there.
I’m not quite sure what “regular, contextual recommendations” means exactly – probably TF*IDF-based content-similarity “more like this” – but I think the overall claim from BayNote is that clickstream data holds the secret to harvesting implicit user-ratings for collaborative filtering recommendations.
The marketing blurb on the Baynote web site says:
By silently observing more than twenty different user actions on your site, Baynote identifies virtual communities of like-minded visitors who have similar intent.
Baynote identifies emerging patterns among the visitors which represents the collective wisdom of the crowd. These emerging patterns represents the true intent of the visitors.
Sounds too simple to be true, doesn’t it? However good your implict rating scheme is, there has to be a lot more going on behind the scenes. If Baynote does the CF recommendations for the customer (library), then it must at least have a catalogue of the customer’s offerings (to make the item recommendations) as well as user-browsing data to do the collaborative filtering. Unless, of course, the items that are being recommended are advertisers items, in which case the catalogue isn’t the library’s but the advertizers’.
This kind of approach will no doubt be better than plain content analysis for advertisers. But my hunch is that this isn’t likely to work that well for end-users of a digital library.
For one thing, there’s the privacy issue with sparse, anonymized data-sets. Nobody seems to mind Google Analytics on e-commerce web sites and blogs, but what the end-user is searching and browsing in a digital library could be highly confidential. Imagine a forensic pathologist investigating the death of Alexander Litvinenko and searching for scientific data on the toxicity of Polonium 210. The browsing behaviour of such a session might not be something that should be used to provide recommendations for other users.
Also, if BayNote are following the DL recommender research from the GroupLens team the recommender service needs more than just the items in the catalogue, it needs citation meta data as well to seed its ratings matrices – the way TechLens does. Yet unless the collection’s catalogue is highly homogeneous or has a large number of well-referenced entries, this may not be a feasible strategy because of the sparsity of references that have entries in the collection’s catalogue.
At any rate, this is an interesting development and I’m looking forward to finding out more about how this approach works.