jump to navigation

Freebase March 18, 2007

Posted by Andre Vellino in Data Mining, Open Source.
1 comment so far

Since I learned that Microsoft bought Medstory and discovered that this specialized medical search engine was developed by a couple of AI veterans (Alain Rappaport and Jay Tenenbaum) I have been wondering what the other luminaries of AI have been up to. Today I discovered that Danny Hillis (formerly of Thinking Machines) has founded Metaweb and its first product is Freebase.

I didn’t understand the value of Freebase at first, but noted with interest their plans to add all of BioMed Central’s open access content into it. The idea pretty simple: its a “Creative Commons Database”. From the Freebase web site:

Freebase.com is home to a global knowledge base: a structured, searchable, writeable and editable database built by a community of contributors, and open to everyone. It could be described as a data commons.

The value of this idea has yet to be demonstrated, but I think this project is quite a bit more interesting than Cyc despite sharing some of Cyc’s world-view about the semantics of structured data. The difference between them is the difference between a web page and a wiki: with Freebase, users can contribute to and enrich not just the information but the interrelations between them. In Cyc, the universe of knowledge is quite a bit more static.

What may well be quite exciting is when this information is intelligently and automatically data-mined for unexpected correlations and inferences. The next step has got to be “Open OLAP” and its associated data-visualization methods.

Kubuntu Woes March 16, 2007

Posted by Andre Vellino in Linux.
1 comment so far

My home PC got fried tonight, and I think that attempting to boot it with Kubuntu was the cause.

Just so you can peg me right – I’m not the kind of guy who rebuilds the latest kernel in my spare time. But I’m not a linux novice either. My stack of Linux distributions goes back to Yellow Dog for 68K Macs.

(more…)

Taste in Summer of Code March 15, 2007

Posted by Andre Vellino in Collaborative filtering, Open Source, Recommender service.
add a comment

It is a pity that CISTI was not selected for Google’s Summer of Code 2007, but it is in good company (e.g. GIMP). However, I’m happy to report that one of the projects that CISTI supports is being mentored by Taste, an open source Collaborative Filtering application developed by Sean Owen. Thanks to Daniel Lemire for suggesting the feature enhancement ideas.

If you are a student, have a look at the Taste Ideas Page and Google’s student application web application. I look forward to helping out, if I can.

Library Portals Survey March 14, 2007

Posted by Andre Vellino in Collaborative filtering, Digital library, Recommender service.
add a comment

Over the past couple of months, I have been looking at some of the personalization features in web-portals for scientific digital libraries and specialized search engines. It seems to me that personalization has not yet penetrated very deeply in scientific libraries but that this likely to change.

There are many portals that offer user-specific experiences and allow users to store queries, subscribe to e-mail alerts and customize some aspects of the portal experience. But the large portals owned by commercial science publishers (such as Web of Science, Scirus, BlackwellSynergy or ACM portal) and even open access publishers (like BioMed Central or Public Library of Science) are quite a bit less experimental than some of the specialized search engines.

I find it rather odd that Google, with it’s personal portal (different hybrid collaborative filtering / personalized search portals are also offered by Collarity and others as well) stands out as a believer in the personalized web experience. Google will keep track of your queries, display statistics about your search behaviour and offer recommendations for pages/ videos / gadgets based on your search history. Yet the quality of Google’s recommendations is necessarily limited because of the diversity of queries and interests that any individual might make in an all-purpose portal.

Scientific libraries, on the other hand, are a much better environment for personalization because the user’s interests and queries are so much more focused than in a commodity search engine. So they should be more successful at providing quality recommendations – assuming the data-sparsity problem can be overcome, which I think it can.

One encouraging observation is that some of the more specialized scientific library portals such as the International Journal of Physics show that the application of commercial clustering technology (see Clusty by Vivisimo) can really help in the query refinement process. Consumer oriented medical search engines like Medstory also return search results in clusters but with bar-charted relevance feedback within the clusters. Once the clusters are no longer restricted to pay-per-view Wall Street Journal articles (maybe Microsoft will just buy out the WSJ :-) ), people will see how useful this is.

I am convinced that good recommender services are next on the agenda for progressive science libraries.

Summer of Code March 13, 2007

Posted by Andre Vellino in Digital library, Open Source.
add a comment

It has been quite an interesting experience to apply to be a mentor to Google’s Summer of Code. This program offers student developers stipends (payed for by Google) to write code for various open source projects (Gnome / Apache / Joomla, you name it.) Each project has a mentor who volunteers to supervise the student over a three month period. The benefit to Google? Well they say it’s not a recruitment program, but they also say that a beneficial side effect is that they use the results from this program to do some recruiting at the end. So it’s good for them and good for the open-source community and good for the students.

The challenge for us at CISTI was that we don’t currently host an open source code base. Yet we also want to mentor some students on projects that are of benefit to the digital library community and are feasible for a summer student project. Coming up with such projects was a challenge in but we submitted a small but reasonable collection of ideas thanks to input from many sources (Daniel, Richard and Glen especially.)  We’ll find out in a couple of days…..