jump to navigation

The Future of Universities is Here July 19, 2012

Posted by Andre Vellino in Open Access, Universities.
add a comment

An impressive list of 16 universities (including the Ecole Polytechnique Federale de Lausanne and the University of Edinburgh) have now signed up with Coursera to offer free on-line courses.  I audited one a few months ago on Natural Language Processing (from Stanford) to see what it was like – it was stunningly good.

My very first thought was “the future of conventional universities is in doubt“. This course alone had 42,000 registrants, 24,000 of which watched at least one video. Only 1,400 of the registrants got a “certificate of achievement” (i.e. completed the course and handed in all the assignments) but in the meantime there were 800,000 video-downloads of the courseware.

Distance-learning or on-line courses have been around for a long time – in the same way that “finger”, “who” and “chat” in Unix had been around a long time before Facebook, Linked-In and Instant Messaging.  The difference now is that major Universities are jumping on the bandwagon and offering them for free.  Why? Perhaps because of decreasing enrolment: free on-line courses are a way to recruit students from everywhere and to show them the best of what universities have to offer.

But also (in the US anyway), education is a business (see the Frontline documentary on the business of higher education: College Inc.)  That universities are feeling the financial pinch and being pressed by their boards to be more agressive in the marketplace was perhaps most visibly illustrated at the University of Virginia (the case against on-line education is elegantly articulated by Mark Edmundson – a professor of English at the University of Virginia – in a New York Times OpEd article).

Making courses on-line available for free will be a moneymaker when they start counting towards a degree, which clearly inevitable in the long run. However, I didn’t expect this development to come so soon after the beginning of the experiment. The Seattle Times reported just yesterday that the University of Washington is going to be offering some of their Coursera courses for credit.

Canada, in the meantime, has its own Canadian Virtual University which lists over 2,000 courses and 300 degrees and diplomas available on-line. The difference with Coursera is that the CVU is not free.

Anyone see any parallels with the publishing industry here?

Elsevier Boycott – Academics, Get a Grip! February 25, 2012

Posted by Andre Vellino in Open Access.
add a comment

At the risk of being shunned by the now 7,000+ prestigious colleagues who are actively boycotting Elsevier, I’d like to appeal to the better angels of their nature and ask them to stop whipping up a frenzy of outrage and indignation that pits Elsevier (“axis of evil”) against Us (“freedom of thought”). I worry that this polarization of the issues is clouding our individual and collective judgement about what the fundamental problems are and what can and should be done about them.

It is undeniable that there are real and serious problems with academic publishing (as pointed out very cogently by Fields Medalist Tim Gowers here,  John Dupuis (Head of York’s Science’s Library), here and Barbara Fister in the Library Journal here). And the Open Access movement is one I support. The concentration of control over journals by one for-profit publisher is clearly one of the core problems and the questionable practices (e.g. “bundling”) that they can consequently employ is another.

But who (or rather what) exactly is to “blame” (if that’s the right thing to do) for this situation? Elsevier is behaving rationally – from a market-forces point of view anyway. Maximizing profits is what any private enterprise does, particularly one that is publicly traded on stock exchanges. Elsevier (the publisher) is owned by Reed Elsevier which also owns Lexis Nexis (which offers law information and services) and Reed Elsevier Business (which provides data services, information and marketing solutions to businesses). Is this a portfolio mix that should be permitted by law? After all there are anti-trust laws that prohibit monopoly ownership in other domains.

One fundamental problem is that a public good (knowledge) has been comoditized, marketed and sold by a private, for profit enterprise. The officials within Elsevier who are in charge of the company don’t have a lot of room to manoeuvre if they are to comply with the stock-market forces that urge them to forever greater profitability.

Here’s a suggestion to the signatories of the Elsevier boycott: go to your pension-fund manager (university or government) and find out if any of the mutual funds, exchange-traded funds or stock portfolios they own have stock in Elsevier-Reed. I’m willing to bet they do. Preasure them to boycott those investments – I’m willing to bet that will have more influence.

Of course, the academic boycott has been heard, as evidenced by Elsevier’s open-letter reply of February 6th. That is one way to precipitate some kind of change towards greater openness of intellectual output. But lets not delude ourselves into thinking that this is going to address the root problem: the inadequate funding of publicly-owned channels of knowledge dissemination.

Instead, could we harness this desire for change towards lobying governments for more funding for university and independant open-access publishers (and tone down the rhetoric against Elsevier a little)?

P.S. I think it’s pretty important, for this post especially, to make it clear that these are my personal opinions (as are all my blog posts here) and in no way reflect the views of my employer.

CISTI Sciverse Gadget App December 13, 2011

Posted by Andre Vellino in CISTI, Digital library, General, Information retrieval, Open Access.
add a comment

Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed a (beta) Sciverse gadget that gives searchers and researchers a window on CISTI’s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI’s search results from a database of over 20 million journal articles.

Next year, I plan follow up with another Sciverse gadget for my citation-based recommender that uses the full power of Elsevier’s API into its collection content.

I want to commend all and sundry at Sciverse Applications for this initiative.  Opening up bibligraphic data and providing developers with a developer platform (a customized version of Google’s OpenSocial platform) is exactly the right kind of thing to do both to benefit third parties (they get access to anotherwise closed and proprietary data) and to enhance their own search and discover environment.

There are, already, several advanced and interesting applications on Sciverse. My favourites are: Altmetric (winner of the Science Challenge prize – see YouTube demo video below) NextBio’s Prolific Authors and Elsevier’s Table Download.

And there will be more to come. An open marketplace like this where the principles of variation and natural selection can operate will, I predict, make for a richer diversity of useful search and discovery tools than any single organization can develop on its own.

The Cost (vs. Value) of Data Curation October 2, 2010

Posted by Andre Vellino in Data, Open Access.
add a comment

There is a tension between the cost (to the curator) of data-curation and the potential value (to others) of making data (e.g. data from scientific experiments) available. For the purposes of selection, it would be nice to know ahead of time whether the data you wish to make available (now) is ever going to have value (in the future).

Unfortunately, you can’t predict that ahead of time because (i) you don’t know who your data-users might turn out to be or (ii) how the circumstances might change that make what was previously an irrelevant-seeming piece of data into an planet-saving one.

Indeed it’s impossible to know how any element of data might be used for any given purpose and by whom. For instance consider whether the Nuclear Magnetic Resonance spectra that you have collected for the purpose of analyzing the structure and composition a pathogen might not be fruitfully reused in the future for the purpose of understanding the bias of an improperly calibrated instrument or indeed (for technology historians of the future) how the (what may then be “primitive”) NMR spectroscopy technology was used in the 20th and early 21st century.

So, how are we to interpret Weinberger’s advice in “Everything is Miscellaneous”?:

  • “The solution to overabundance of information is more information”
  • “Filter on the way out, not on the way in”
  • “Put each leaf on as many branches as possible”
  • “Everything is metadata and can be a label”
  • “Give up control”
  • “A ‘topic’ is anything someone somewhere is interested in.”

The cash value of this advice for data: publish as much data in as you can; give users as many ways as you can to let them get at it (e.g. APIs but also user-interfaces); give users as many ways as you can to add more data (tags, metadata, text, links to other data – viz. “linked data“).

Which is fine advice if you assume that publishing data, like putting text and images on the internet is (almost) free. But publishing data isn’t (yet) close to free.  Why? Because it (still) needs to be curated by someone who understands how to annotate it in at least the obvious ways in which it may be useful – e.g. to other contemporary scientists.

Prediction: either scientist will have to be trained to become data-curators or the process of creating data will have to generate the metadata or e-librarians will have to have train in the sciences (inclusive sense of “or”).

Scientific Research Data August 23, 2010

Posted by Andre Vellino in Data, Information, Information retrieval, Open Access.

Scientific research data is, without a doubt, a central component in the lifecycle of knowledge production. For one thing, scientific data is critical to the corroboration (or falsification) of theories. Equally important to the process of scientific inquiry is making this data openly available to others – as is vividly demonstrated by the so-called “ClimateGate” controversy and the more recent cloud on Marc Houser’s research data on primate cognition. The public accessibility of data enables open peer review and encourages the reproducibility of results.

Hence the importance of data management practices in 21st century science libraries: the curation of, access to and preservation of scientific research data set will be critical to the future of scientific discourse.

It is true that “Big Science” has been in the business of curating “reference data” for years. Institutional data centers in many disciplines have been gathering large amounts of data in databases that contain the fruit of years of research. GenBank, for instance, is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (containing over 150,000 sequence records.)

However, other kinds of data gathered by scientists are either transient or highly context-dependant and are not being preserved for the long term benefit of future research either by individuals or by institutions. This might not be so serious for those data elements that are reproducible – either by experiment or simulation – but much of it, such as data on oil-content and dissipation rates in the Gulf of Mexico water column in 2010, is uniquely valuable and irreproducible.

As I indicated in a previous post, one development that will help redress the problems endured by small, orphaned and inaccessible dataset is the emergence of methods of uniquely referencing datasets such as the data DOIs that are being implemented by DataCite partners.  The combination of data-deposit policies by science research funding agencies (such as NSF in the US and NSERC in Canada) and peer-recognition from university faculty for contributions to data repositories, data publication and referencing will soon grow to match the present status of scholarly publications.

In parallel, the growing “open access for Data” movement and other initiatives to increase the availability of data generated by government and government-funded institutions (including NASA, the NIH and the World Bank are now well underway in a manner consistent with the OECD’s principles, which, incidentally, offers a long and convincing list of economic and social benefits to be obtained from making accessible scientific research data.

In particular, the United States , the UK and Australia are spearheading the effort of making public and scientific research data more accessible. For instance, in the U.S., the National Science and Technology Council (NSTC)’s recent report to President Obama details a comprehensive strategy to promote the preservation of and access to digital scientific data.

These reports and initiatives show that the momentum is building globally to realize visions that have been articulated in principle by several bodies concerned with the curation and archiving of data in the first decade of the 21st century (see To Stand the Test of Time and Long Lived Scientific Data Collections).

In Canada, several similar reports such as the Consultation on Access to Scientific Research Data and the Canadian Digital Information Strategy also point to the need for the national stewardship of digital information, not least scientific data sets. Despite much discussion, systematic efforts in the stewardship of Canadian digital scientific data sets are still only at the preliminary stages.  While there are well managed and curated reference data in domains such as earth science (Geogratis) and Astronomy (Canadian Astronomy Data Centre) which have a community of specialist scientific users and whose needs are generally well met, the data-management needs of individual scientists in small, less well funded research groups is either impossible to find or lost.

One impediment to the effective bibliographic curation of data sets is the absence of common standards. There are currently “no rules about how to publish, present, cite or otherwise catalogue datasets.” [Green, T (2009), “We Need Publishing Standards for Datasets and Data Tables”, OECD Publishing White Paper, OECD Publishing]

CISTI’s Gateway to Scientific Data sets and other such national sites (e.g. the British National Archives of Datasets) that aggregate information about data sets, use bibliographic standards (e.g. Dublin Core) for representing meta-data.  The advantage is that these standards are not domain-dependant yet sufficiently rich to express the core elements of the content needed for archiving storage and retrieval.  However, these metadata standards, developed for traditional bibliographic purposes, are not (yet) sufficiently rich to fully capture the wealth of scientific data from all disciplines, as I argued in a previous post.

One of the major concerns when deciding on the feasibility of creating a data repository is the cost associated with the deposit, curation and long-term preservation of research data. Typically, costs depend on a variety of factors including how each of the typical phases (planning, acquisition, disposal, ingest, archive, storage, preservation and access services) are deployed (see the JISC reports “Keeping Research Data Safe” Part 1 and Part 2). The costs associated with different data collections are also likely to vary considerably according to how precious (rare/valuable) the stored information is and what the requirements are for access over time.

One point to note from the “Keeping research data safe” reports commissioned for JISC is that

“the costs of archiving activities (archival storage and preservation planning and actions) are consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access.”

In short – librarianship for datasets is critical to the future of science and technology costs are the least of our concerns.

Springer Open (Access) June 29, 2010

Posted by Andre Vellino in Open Access.

The science publisher Springer has announced that it has fully adopted the open access model for its on-line journals: Springer Open!

Not only is that a progressive move, it’s an economic necessity. As academic libraries are cutting back on subscriptions to deal with budget cuts and publishers increase their subscription fees, the net result of the traditional economic model can only spell disaster, as evidenced by the recent and public battle between the University of California and Nature Publishing Group.

Making authors and researcher funders pay for academic publishing and giving away access to readers seems to be the only viable model left. I think it’s only a matter of time before other academic publishers follow suit.

I worry a little about independent researchers who don’t have the thousands of dollars in grant money that are going to be required to engage in the peer-reviewed publishing process. University budgets are being squeezed too and the Open Access model is going to add pressure on that part of the overall academic publishing  ecosystem.