jump to navigation

CISTI Sciverse Gadget App December 13, 2011

Posted by Andre Vellino in CISTI, Digital library, General, Information retrieval, Open Access.
add a comment

Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed a (beta) Sciverse gadget that gives searchers and researchers a window on CISTI’s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI’s search results from a database of over 20 million journal articles.

Next year, I plan follow up with another Sciverse gadget for my citation-based recommender that uses the full power of Elsevier’s API into its collection content.

I want to commend all and sundry at Sciverse Applications for this initiative.  Opening up bibligraphic data and providing developers with a developer platform (a customized version of Google’s OpenSocial platform) is exactly the right kind of thing to do both to benefit third parties (they get access to anotherwise closed and proprietary data) and to enhance their own search and discover environment.

There are, already, several advanced and interesting applications on Sciverse. My favourites are: Altmetric (winner of the Science Challenge prize – see YouTube demo video below) NextBio’s Prolific Authors and Elsevier’s Table Download.

And there will be more to come. An open marketplace like this where the principles of variation and natural selection can operate will, I predict, make for a richer diversity of useful search and discovery tools than any single organization can develop on its own.

Google Books on Charlie Rose March 8, 2010

Posted by Andre Vellino in CISTI, Digital library, General, Open Access, Search.
add a comment

Google Books discussion on Charlie Rose

I found this conversation about the “Google Books” library very interesting.  It is was between Robert Darnton (professor of American cultural history at Harvard and Director of the Harvard University Library), David Drummond (Chief Legal Officer at Google), bestselling author James Gleick and Charlie Rose (from PBS) last night.

I was especially pleased to see Prof. Darnton insist on the need to guarantee “the public interest”.  Only he seemed to have the long view, though.

The (long tail) End of the Book December 4, 2009

Posted by Andre Vellino in CISTI, Digital library, General, Information.

walrusI would venture to guess that Noah Richler (son of Mordecai) is not the first journalist to predict the demise of the book.  In his article in the October edition of The Walrus, Richler says:

….the book industry’s digital future, one that was comfortably far off even as the music industry was being decimated, is now ineluctably and forcefully here.

Richler asserts that instead of propagating a greater variety of books to the general public via the effect of the long tail, the web has benefited blockbusters:

Rather than connecting the public to the glorious cornucopia of the Long Tail, the effect of the web has been to serve fewer blockbusters better.

Worse, he says:

Today, just a small number of books compete for consumers’ fleeting attention. The tail is longer, but it is also thinner.

Exacerbating the problem for the publishing industry is the increasing amount of  free content, especially via Google Books:

By copying the books first and negotiating later, Google has, in effect, established itself as the biggest pirate in the world.


…digital is where the puck is going to be, and publishers have no choice but to skate toward it.

True enough, but “digital” also means “ephemeral” – which is fine for Robert Ludlum novels, but less so, perhaps for the Origin of Species and other milestones in the history of human knowledge.

Contra some bloggers who predict the end of the book by pointing to the growing sales of e-Readers like the Kindle, I think there’s a very important difference between the playback of video or  music on iPods and the reading of books on e-Readers.

First video and audio have to be viewed or heard with some kind of playback device (cd player / tape player / video player).  Not so with print. You can read a paper book with no playback device and doing so has no up-front device costs (Kindle / iPod), no power requirements and it is invulnerable to electro-magnetic pulses. Furthermore paper books already have all the right digital rights management (DRM) mechanisms built-in: you can borrow them, you own them and you can re-sell them.

Second, the typical “unit” in the audio industry is the 5-minute song. The typical “unit” in literature (i.e. book) is ~ 300 pages.  The amount of time and attention required to “consume” a book is several orders of magnitude greater than a song. This makes a huge difference to how human beings like to ingest this content.  I can spend hours reading a book (at 600 dots per inch) – less on a low-resolution device like a Kindle.  Audio and video are much better, by comparison.

Last but not least is the impermanence inherent in digital formats and media. Entrusting our knowledge to digital formats (e.g. PDF) and media (e.g. hard disks) commits us to an ephemeral cultural and intellectual memory. While digitization has its virtues (e.g. searching, social tagging, clustering) it also harbours (invisible) dangers (e.g. digital rot, ease of forgery, dependence on rapidly changing software and hardware developed at great expense in the private sector.)

The inherent conservatism in librarianship that values, organizes and manages paper is a welcome counterweight to the near-term myopia of digital early-adoption.  Both have their place in the 21st century but I worry about our putting all our eggs in the digital basket.

Diversity March 9, 2009

Posted by Andre Vellino in CISTI, Digital library, General.
1 comment so far

memeRichard Dawkins and others have been talking about memes (the cultural analog of genes) for over a decade and the topic came up once again in Daniel Dennet’s Darwin memorial lecture at Carleton a few weeks ago. It got me thinking about “diversity” in the non-biological world.

The importance of diversity in operating systems for security purposes has been known for a long time. Allowing the dominance of a monoculture of operating systems such as MS Windows makes computers and networks vulnerable to viruses and attacks. Fostering a diversity of OSes is a healthy thing for much the same reason that it is desirable to have genetic diversity in the biosphere.

Analogous arguments apply for encouraging variety in search engines algorithms. Relying only on Google or Yahoo’s secret sauce for ranking web sites could be detrimental to your “research health”. For one thing commodity search engines cater (mostly) to the naive user with ordinary search needs.  Hence the search results and how they are ranked is perhaps not optimal for a scientist or an academic, which is why we also have Google Scholar and other science-oriented search engines such as Scirius and CiteSeerX.

The same kind of argument can be made for variety in the formats in which text and data are stored. While there are frequent calls for standards and interoperability among text publishing formats there are also a number of tools for format conversion (e.g. between PostScript and PDF) and the existing variety of text formats has value, just as the wide variety of photo formats (TIFF/ JPG / PNG etc.) meet different needs in different niches.

In addition to variety among digital formats, though, I would like to add a post-script to my entry last week in support of the need for variety in storage media types. In a previous post I suggested some arguments in favour of paper formats instead of or at least as a compliment to digital ones.  Computers, disks and the digital objects stored on them all suffer from vulnerabilities (digital rot / availability / errors, etc.) Paper has its own vulnerabilities too, of course, but I often use it as a backup medium and it works well for that purpose.

The slothful pace at which libraries are transforming to adapt to 21st century technology may, in retrospect, be viewed as salutary.  Look at how proud Canadians are  now to have an old-fashioned, regulated banking system that didn’t sucumb to “modern” financial instruments like mortgage derivatives.

Paper vs. Bytes February 24, 2009

Posted by Andre Vellino in CISTI, Digital library, General.
1 comment so far

books02-619x685Until just last week, water-cooler conversations in our library sometimes went to the question of whether a paper collection has value in the 21st century.  The universal consensus seems to be that books and paper journals are out and that the future is digital.  After all, paper is expensive to produce, transport and store.  It also takes up space and can’t be searched or retrieved without meta-data and catalogues. In short, paper collections are less preferable in every way to digital ones.  So went film cameras and paper photography, after all.

Ever the contrarian, I sometimes argue the case for paper, at the behest of my bookish spouse (as you might expect from a professor of English literature.)  

Here then are some arguments for paper.

  1. Once produced paper requires no further technology to access – no electicity, no computers, no software. 
  2. The fact that paper takes up space and is expensive to store obliges the “stewards of content” (aka librarians) to be selective about what they accept and keep in their collections.  The high cost of publishing is only justifiable if the quality is high, hence an expensive storage vehicle increases the likelihood that what is preserved in libraries is high quality.
  3. Print has prestige (perhaps because of [ii]).  [See the January ’09 CBC Spark Podcast on how newspapers are making a comeback]
  4. At 167 ppi (the current resolution for the Amazon Kindle Book Reader), reading paper is easier on human eyes for which content is (still now, mostly) intended.
  5. Computers contribute to our individual and collective distraction.  Should we really be enhancing our tendency to juggle so many things at a time?

Of course, each of these arguments has counter-arguments too.  Paper rots easily, requires computer technology to access it and buildings to protect it. Also, perhaps librarians shouldn’t be the arbiters of what is collected – it contradicts the (now popular) idea underlying Everything is Miscellaneous” wherein pretty much everything has “value”, depending on who you are and what your perspective is.

Still, I do think there’s a place for paper.  I don’t think the phrase “digital library” will join the ranks of oxymorons like “paperless office” but I do think (hope) that not all libraries of the future will be (entirely) digital.  Each dimension of the library has it’s niche, I think.

The Mechanical Librarian February 19, 2009

Posted by Andre Vellino in CISTI, Collaborative filtering, General, Recommender, Recommender service.
add a comment

mechanical-turkI was scheduled to give a CISTI Seminar yesterday, entitled “The Mechanical Librarian” but it was pre-empted by an address from NRC’s president. 

I was primed to give the talk, though, because recommenders for scholarly digital libraries are coming of age and there’s lots to say about them.

The presentation (which you can view here) covers a lot of ground at quite a high level, including a brief screen-shot demo of the Synthese Recommender: it was intended for a general audience, mostly of librarians and information specialists. 

One chart lists some of the digital library recommenders that have been either developed or studied in the last 7 years:

Techlens (University of Minnesota) (2002)

  • Uses ACM DL, full text Mixed Hybrid (CF – CBF)

BibTip (University of Karlsruhe) (2003)

  • Uses OPAC (Library Catalog) usage data for collaborative filtering

IngentaConnect (2007)

  • Uses Baynote (SaaS) customer tracking

DSpace (2008)

  • Content-based recommender based on user-bookmarks

CiteULike (academic experiment 2008)

  • Collaborative filtering on user bookmarks from CiteULike

“bX” system from Ex Libris (2009)

  • Uses SFX resolver logs

NextBio (to be announced in March 2009)

  • Life sciences search engine that uses collaborative filtering + ontologies to suggest new content (trials / abstracts / data)

Let me know if I’ve missed something.