January 22, 2007

Andre Vellino

I think it is possible and would be useful to write software that helps users build “virtual collections” of scholarly articles according to themes or intersecting subjects of interest. This would encourage readers to make serendipitous discoveries that typically occur in browsing mode rather than searching mode.

The trend towards the electronic self-publishing of scholarly research produces an author-centric result: collections of articles that are produced by individuals and centered on the author’s interests. In many circumstances this is desirable. A good research scientist tends to produce consistently good research in a thematically coherent subject area, so if you have found a person at the top of his or her field, then chances are good that a collection of articles by that author will be a useful resource.

In contrast, collections of articles that are centered on subjects or interdisciplinary themes – such as conference proceedings or special issues of a journal – provide a diverse collection of researchers’ work in a given area. The quality of the collection as a whole then depends (in part) on the editor of the collection and the requirements for entry into the collection.

One big advantage of an article collection is that (typically) its content is heterogeneous and represents a diversity of viewpoints. If I am interested in a specific article in a collection, chances are good that I will also be somewhat interested in other articles in that collection. Furthermore, the heterogeneity of articles in collections provides the reader with breadth and may encourage the discovery of new research by serendipity. This is, typically, the value that browsing has over searching.

One simple way to create “virtual collections” of scholarly research articles would be to create a complex query that searches a number of different attributes of the corpus and its meta-data: full-text, abstract, author, date-ranges, citation attributes, subject, etc. These compound queries would return a list of relevant results, which could be “edited” manually by a human being and stored as a virtual collection.

Such queries could then be stored and shared with like-minded users, a bit like bookmarks are shared today.  Or the search for an individual article could return a result that indicates that it also belongs to a set of “virtual collections” that could be browsed.


1. Richard Akerman - January 24, 2007

But surely Connotea or similar academic article bookmarking software lets one create virtual collections using tags.

It seems to me you are mixing manually-created (“self-curated”) collections with automatically generated virtual collections.

2. Andre Vellino - January 24, 2007

Yes, you are quite right that you could use Connotea to build virtual collections that way (another reader suggested Google queries + TinyURL to achieve the same effect.)

What I wanted was a search engine that enabled you to express (and store) a complex enough search query (on the all the bibliographic metadata, citation index, semantic information in the full text, plus the tags, etc.) and enabled you to exclude irrelevant results (e.g by deslecting elements of the result set).

It would be nice to (as a second, less simple step) build virtual collections automatically, from usage-data (w/ collaborative filtering) and text analysis etc. I think the automated curation of collections might be “induced” (by example) from the manually currated kind.

