jump to navigation

Baynote CF for IngentaConnect December 1, 2007

Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender, Recommender service.
trackback

It’s validating to see that there are commercial offerings for recommender systems in digital libraries. My colleague Richard Akerman, blogger extraordinare for all things digital library, pointed me to this announcement by IngentaConnect declaring that they are using Baynote to provide collaborative filtering recommendations for scholarly publications. The announcement reads:

… a new partnership with content guidance pioneer Baynote will see IngentaConnect providing “more like this” article recommendations based on both the current context but also, more unusually in scholarly publishing, the user’s previous behaviour. Articles reviewed or acquired by users with similar interests and behaviour will be recommended for consideration and potential purchase.

The “more about how it works” section reads:

… context and behaviour are combined to determine the user’s intent, which is then analysed for relevance to that of the site’s other users; patterns that emerge from this analysis are used to recommend additional content which is more likely to be of interest and relevance to the user than regular, contextual recommendations. Sophisticated behavioural analysis monitors not simply clicks and page views, but also the length of time that a user spends on the page and the type of activities that they carry out there.

I’m not quite sure what “regular, contextual recommendations” means exactly – probably TF*IDF-based content-similarity “more like this” – but I think the overall claim from BayNote is that clickstream data holds the secret to harvesting implicit user-ratings for collaborative filtering recommendations.

The marketing blurb on the Baynote web site says:

By silently observing more than twenty different user actions on your site, Baynote identifies virtual communities of like-minded visitors who have similar intent.

….

Baynote identifies emerging patterns among the visitors which represents the collective wisdom of the crowd. These emerging patterns represents the true intent of the visitors.

Embed a couple of Baynote JavaScript tags (one to monitor users and one to display BayNote’s recommendations) on your web site and, voila, you have yourself a recommender service.

Sounds too simple to be true, doesn’t it? However good your implict rating scheme is, there has to be a lot more going on behind the scenes. If Baynote does the CF recommendations for the customer (library), then it must at least have a catalogue of the customer’s offerings (to make the item recommendations) as well as user-browsing data to do the collaborative filtering. Unless, of course, the items that are being recommended are advertisers items, in which case the catalogue isn’t the library’s but the advertizers’.

This kind of approach will no doubt be better than plain content analysis for advertisers. But my hunch is that this isn’t likely to work that well for end-users of a digital library.

For one thing, there’s the privacy issue with sparse, anonymized data-sets. Nobody seems to mind Google Analytics on e-commerce web sites and blogs, but what the end-user is searching and browsing in a digital library could be highly confidential. Imagine a forensic pathologist investigating the death of Alexander Litvinenko and searching for scientific data on the toxicity of Polonium 210. The browsing behaviour of such a session might not be something that should be used to provide recommendations for other users.

Also, if BayNote are following the DL recommender research from the GroupLens team the recommender service needs more than just the items in the catalogue, it needs citation meta data as well to seed its ratings matrices – the way TechLens does. Yet unless the collection’s catalogue is highly homogeneous or has a large number of well-referenced entries, this may not be a feasible strategy because of the sparsity of references that have entries in the collection’s catalogue.

At any rate, this is an interesting development and I’m looking forward to finding out more about how this approach works.

Comments»

1. lemire - December 1, 2007

There has been a lot of work done on privacy in recommender systems.

The problem is difficult if and only if everyone’s work is confidential. It helps to think of peer-to-peer networks. There are seeders (those who provide data) and leeches. Recommender systems can work the same way, at least in principles. There are some people who share their data and others who only want to consume it.

Currently, most searches I do in “digital libraries” are not confidential. (My main digital library being Google Scholar!) For a time, I even tried posting the different research papers I was reading on my blog, hoping this may help others who do similar research (but this lead to a bad case of information overload since there is little structure in what I browse).

However, there are instances where we might want to consult a digital library in a confidential way. Sure. There we could just make sure that whatever data is only collected locally (on a secure PC).

The smarter and more complex alternative is to collect only aggregated data and hope that nobody can reverse-engineer your data. This is hazardous, I think, even with the best theoretical models.

2. Andre Vellino - December 2, 2007

Yes, there has been a lot of research in the area of privacy. But I wasn’t (especially) worried about the disclosure of personal identity information, as in the recent hack of the Netflix dataset (http://arxiv.org/abs/cs/0610105). My worry had to do with the fact that CF information was being used at all. Perhaps the person browsing about Polonium 210 works for the FBI and the co-downloading facts about his (or her) browsing behaviour shouldn’t be used in any subsequent recommendations to anyone else.

This is not the usual scenario for you and me – but definitely a use-case for researchers in startup companies where the number of possible users of the information is very small. The very fact that someone who downloaded papers on Gallium Arsenide also downloaded papers on SiGe FPGAs might be of some some competitive advantage to the next company using this information resource.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: