jump to navigation

PageRank Effect on Collaborative Filtering July 30, 2008

Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender.
3 comments

I have done some experiments on the impact of PageRank on a collaborative filtering recommender for journal articles. The results are counterintuitive – to me anyway – but I think they might have a plausible explanation (I’m working on one anyway.)

I followed in the footsteps of TechLens+ and used article references as a proxy for “ratings” – in other words, assume that one article citing another means a (boolean) “positive vote” for the cited article. It’s a poor approximation, but it addresses the cold-start problem for a digital library recommender.

The idea behind using PageRank was to refine these boolean ratings and rank them on a scale. Using numeric PageRank values on the ratings (rather than a boolean value) has a surprising effect: Top-N prediction quality goes down! Furthermore, random values for PageRank are about the same as boolean (constant) values for PageRank.

I trust the Daniel Lemire is right about the value of negative results.

Digg Recommender July 29, 2008

Posted by Andre Vellino in Collaborative filtering.
1 comment so far

I’m a little behind on my summer blogging (reading and writing) and seem to have missed Digg’s announcement for their recommendation feature.

I don’t Digg myself, but from the video demo (on the blog link above), it looks like they have struck a nice balance between pure collaborative filtering, topic classification and serendipity of recommendations (from different samples of users within a given similarity neighbourhood.)

Canada # 1 in Computer Science July 10, 2008

Posted by Andre Vellino in CISTI, Data Mining.
add a comment

Glen Newton alerts us to a recent article published in Scientometrics from which he deduced that Canada is the #1 producer of Computer Science research papers (per capita). This doesn’t come as a complete surprise, given the overall #6 ranking that I had noted Canada had in overall scientific publications output.

Distracted July 3, 2008

Posted by Andre Vellino in Collaborative filtering, Digital library, Information retrieval, Recommender service.
2 comments

I heard journalist Maggie Jackson this morning speaking on the radio about her new book. Distracted: The Erosion of Attention and the Coming Dark Age.

Despite our wondrous technologies and scientific advances, we are nurturing a culture of diffusion, fragmentation, and detachment. In this new world, something crucial is missing–attention. Attention is the key to recapturing our ability to reconnect, reflect, and relax; the secret to coping with a mobile, multitasking, virtual world that isn’t going to slow down or get simpler. Attention can keep us grounded and focused–not diffused and fragmented.

The Wall Street Journal review of the book relates that:

In the end, Ms. Jackson makes her way to a Buddhist monastery, where people are learning to practice samatha – that is, to exercise voluntary control over their attention. Mountain retreats may not be for everyone, but the spirit of such an effort makes obvious sense in an era of information glut and tech-driven interruptions. Of course, if samatha – or something like it – turns out to be a good idea, it will be blogged about, praised in group emails, discussed online and debated in instant messages. Work will just have to wait.

So the answer to information overload may be to practice Buddhist meditation.

Alternatively, you could go for the “technology fix”.  On the same radio show they interviewed Jon Herlocker, one of the founders of Smart Desktop, an outgrowth of his research on TaskTracer. Jon comes from the world of Collaborative Filtering – he founded the music recommender Music Strands (now MyStrands) and has also worked on recommending documents in a digital library.

Which do you think is more likely to work?