PageRank Effect on Collaborative Filtering July 30, 2008
Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender.3 comments
I have done some experiments on the impact of PageRank on a collaborative filtering recommender for journal articles. The results are counterintuitive – to me anyway – but I think they might have a plausible explanation (I’m working on one anyway.)
I followed in the footsteps of TechLens+ and used article references as a proxy for “ratings” – in other words, assume that one article citing another means a (boolean) “positive vote” for the cited article. It’s a poor approximation, but it addresses the cold-start problem for a digital library recommender.
The idea behind using PageRank was to refine these boolean ratings and rank them on a scale. Using numeric PageRank values on the ratings (rather than a boolean value) has a surprising effect: Top-N prediction quality goes down! Furthermore, random values for PageRank are about the same as boolean (constant) values for PageRank.
I trust the Daniel Lemire is right about the value of negative results.
Digg Recommender July 29, 2008
Posted by Andre Vellino in Collaborative filtering.1 comment so far
I’m a little behind on my summer blogging (reading and writing) and seem to have missed Digg’s announcement for their recommendation feature.
I don’t Digg myself, but from the video demo (on the blog link above), it looks like they have struck a nice balance between pure collaborative filtering, topic classification and serendipity of recommendations (from different samples of users within a given similarity neighbourhood.)
Canada # 1 in Computer Science July 10, 2008
Posted by Andre Vellino in CISTI, Data Mining.add a comment
Glen Newton alerts us to a recent article published in Scientometrics from which he deduced that Canada is the #1 producer of Computer Science research papers (per capita). This doesn’t come as a complete surprise, given the overall #6 ranking that I had noted Canada had in overall scientific publications output.
Distracted July 3, 2008
Posted by Andre Vellino in Collaborative filtering, Digital library, Information retrieval, Recommender service.2 comments
I heard journalist Maggie Jackson this morning speaking on the radio about her new book. Distracted: The Erosion of Attention and the Coming Dark Age.
The Wall Street Journal review of the book relates that:
So the answer to information overload may be to practice Buddhist meditation.
Alternatively, you could go for the “technology fix”. On the same radio show they interviewed Jon Herlocker, one of the founders of Smart Desktop, an outgrowth of his research on TaskTracer. Jon comes from the world of Collaborative Filtering – he founded the music recommender Music Strands (now MyStrands) and has also worked on recommending documents in a digital library.
Which do you think is more likely to work?
