jump to navigation

Evaluating Article Recommenders July 23, 2009

Posted by Andre Vellino in Collaborative filtering, Recommender.

journalsIn his March article for CACM, Greg Linden opines that RMSE (Root Mean Square Error) and similar measures of recommender acuracy are not necessarily the best ways to assess their value to users.  He suggests that Top-N measures may be preferable if the problem is to predict what someone will really like.

“A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies.  RMSE equally penalizes errors on movies you do not care about seeing as it does errors on great movies, but perhaps what we really care about is minimizing the error when predicting great movies.”

This problem is compounded when it isn’t even possible to measure errors of any kind. Suppose you have an item-based recommender for journal articles in a digital library and recommendations are restricted to items in the collection owned by the library.  These recommendations are then restricted to a certain set which may be incommensurable with recommendations generated from a different collection. So any quality measure would depend on the size of the collection.

How then would one go about evaluating recommendations in this circumstance?  One way is for an expert to inspect the results and judge them for relevance or quality.  Another is to measure some meta-properties of the recommendations, such as their semantic distance from one another or from the item they are being recommended from.  At least y0u would be able to say that one recommender offers greater novelty or diversity than another.

This is the kind of approach taken by Òscar Celma and Perfecto Herrera in a paper delivered at Recommender Systems 2008. They concluded that content-based recommendations for music that are less biased by popularity (i.e. more biased toward content-similarity) produced less novelty in recommendations and also less user-satisfaction.

While music listeners may appreciate novelty and diversity, my expectation is that users of recommenders for scholarly articles actually want something closer to “more like this” (content similarity) than “other users who looked at this looked that” (collaborative filtering).

At least that’s the conclusion (not yet scientifically corroborated) that I came to when I compared a usage-only recommender (‘bX’ from Ex Libris) to a citation-only recommender for scholarly articles (Synthese). At first blush ‘bX’ produces more “interesting” recommendations (greater diversity) whereas Synthese (in citation-only mode anyway) generates more “similar” recommendations.

Perhaps what the user needs is both kinds of recommenders – depending on thier information retrieval needs.


1. Daniel Lemire - July 25, 2009

I think that the whole Netflix/RMSE thing is not where the next step is in recommender systems. We need to broaden the applications.

Have you tried something like this: papers who have cited papers X, Y, Z have also cited paper W? I’d love to get an analysis of a paper I am about to submit to see whether I have omitted any reference… It would be cool to determine, mathematically, whether a set of reference is “complete” in some sense.

(There has been a lot of work done on mining frequent “minimal” item sets. I am guessing it could be applied to this problem.)

Minimally, just this feature: papers which have cited this paper have also cited this other paper… that’d be great.

Andre Vellino - July 25, 2009

The next version of Synthese will be exactly what you suggest – the “minimal” feature, that is – (which is, of course, simpler than what I was experimenting with). Adding the controls you want with your second suggestion is pretty easy too, except for figuring out how the UI should look.

The data I have is *very* sparse. On about 8M science articles, I’m only able (currently) to produce item-based recommendations (based on citations only) for about 1.8M of them. I’m now working on reducing the sparsity of my data….

There’s also going to be an OpenURL API, very much like ‘bX’ – submit an OpenURL to Synthese and you get back metadata about the recommended articles in XML. Should be done in September sometime.

2. Daniel Lemire - July 25, 2009

Then, you could go further:

Suppose that I say “No, I don’t want to cite paper W even though I plan to cite papers X, Y, Z”… then what does it say? Can you then focus on papers (if any) citing papers X, Y, Z but not paper W?

The fun thing is that these are small data sets (very sparse) so you can do the computations live with little effort.

Andre Vellino - July 25, 2009

Interesting suggestion. Thanks Daniel.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: