jump to navigation

PageRank Effect on Collaborative Filtering July 30, 2008

Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender.

I have done some experiments on the impact of PageRank on a collaborative filtering recommender for journal articles. The results are counterintuitive – to me anyway – but I think they might have a plausible explanation (I’m working on one anyway.)

I followed in the footsteps of TechLens+ and used article references as a proxy for “ratings” – in other words, assume that one article citing another means a (boolean) “positive vote” for the cited article. It’s a poor approximation, but it addresses the cold-start problem for a digital library recommender.

The idea behind using PageRank was to refine these boolean ratings and rank them on a scale. Using numeric PageRank values on the ratings (rather than a boolean value) has a surprising effect: Top-N prediction quality goes down! Furthermore, random values for PageRank are about the same as boolean (constant) values for PageRank.

I trust the Daniel Lemire is right about the value of negative results.


1. Daniel Lemire - July 30, 2008

Well. If you did come up with an article that said “pagerank does not work for co. fi.” I think it would be a valuable contribution.

Argh! You are not giving us enough details to comment on your algorithms though… I am not sure what “constant” versus “boolean” means here.

Do you actually compute pagerank over the citation graph? Wouldn’t your random walk always go back in time? (Except for a probability of p=0.15 to “jump out”?)

A natural thing to compare against is the “number of citations” a paper received (computed the obvious way).

2. Andre Vellino - July 30, 2008

Thanks Daniel. I’m working on the paper (with approximately that title!)

I’m treating “constant” and “boolean” interchangeably. PageRankValue=constant (if there is a reference) is the same (as far as rank-ordering Top-N recommendations) as PageRankValue=1 (or 0 if there is no reference).

Yes, I compute the pagerank over the entire citation graph and yes a random walk would go back in time. Any special reason to be concerned about that?

Good suggestion to compare with citation index. I hadn’t thought of that – the obvious ideas are often the best!

3. Daniel Lemire - August 3, 2008

This is fine if you graph points backward in time, I just wanted to make sure I understood.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: