jump to navigation

International Digital Curation Conference 2015 February 17, 2015

Posted by Andre Vellino in Data, Data Curation.
1 comment so far

dccI had never intended to leave this blog void of entries in 2014, let alone leave it with a “top 10” list as the last entry.  So it’s time to re-boot Synthese with a short report on the 2015 International Digital Curation Conference.

The opening keynote by Tony Hey was both a master-class in how to give a compelling lecture and an impressive demonstration of how much one person can know about his field.  When the video of this talk comes out, watch it!

It was also great to see such a wide variety of topics in the poster sessions: A poster on Data Citation was the award winner (I still can’t believe that the graduate student who did this research had to pay for her own subscription to Web of Science to do this research!). The runner-up award for best paper was about authorship attribution metadata to climate datasets.

Climate data figured quite prominently, including at least three talks : one on implementing an ISO standard MOLES3 (Metadata Objects Linking Environmental Sciences) at the Centre for Environmental Data Archival a second on Twenty years of data management in the British Atmospheric Data Centre and my own on Harmonizing metadata among diverse climate change datasets.

There were 3 parallel sessions on the second day – one just has to be resigned to giving up on two thirds of the interesting talks. I did go to this one one: A system for distributed minting and management of persistent identifiers, which I found especially intriguing. In a sentence, it proposes to do for digital identifiers (e.g. DOI) what Bitcoin does for money. In other word’s it’s a Bitcoin-like, distributed and secure method of generating unique identifiers.  I hope is succeeds.

This talk by the Ph.D. student Tiffany Chao Mapping methods metadata for research data struck me as a perfect application for text mining.  She proposes extracting the Methods and Instrumentation sections from the National Environmental Methods Index to generate metadata descriptors for the corresponding datafiles.  Right now her work is being done by hand to demonstrate its feasibility but a machine could do it too.

I registered for a DataCarpentry workshop to “access life science data available on the web”.  I learned a little R programming, discovered the ROpenSci repository and got my feet wet with the AntWeb and Gender packages. I look forward to graduating to rWBclimate, an R interface to the World Bank climate data in the climate knowledge portal.

One treasure trove led to another. I gate-crashed a small visualization hackathon workshop at which I discovered the British Library’s digital collection and the 1001 things that could be done to it if you had a small army of graduate students in the Digital Humanities at your disposal. Hopefully, that’s exactly what’s going to happen when the Universities of Cambridge, Oxford, Edinburgh, Warwick and University College London start collaborating at the Alan Turing Institute (to be located in the British Library).

The Data Spring Workshop was exciting in a different way – a lot of presenters gave lightening talks on their practical problems and solutions with managing data.  There was so much, I can hardly remember any of it!  One item stood out for me,  though, because it addresses my pain: a method for re-creating and preserving the environments for computational experiments.  It took me about 1.2 minutes to become an instant convert to Recomputation.org mission.

This only skims the surface, but it will have to do for now.