MetaData in Social Science February 13, 2007Posted by Andre Vellino in Data Mining.
I was aware that there are repositories of and search engines for many databases in various “hard science” disciplines like Chemistry and Astronomy, but until a few weeks ago, it hadn’t occurred to me that social scientists also have large and valuable collections of digital data and that these too are “published”.
Social science data consist of demographic studies, polls, census, epidemiological studies, etc. and are typically the result of surveys. Hence the raw data must be protected, e.g. by tight access-controls and private networks. In Canada, data gathered by Research Data Centers are protected by the Statistics Act, which includes provisions for gaol sentences for people who misuse protected information.
But there are also public sources of social science data where aggregate information is anonymized so that personal identity information cannot be inferred. And, as with the open source movement in software and the open access movement in scholarly publishing, social science data in Canada has its very own liberation movement.
Social scientists’ ability to extract any meaning from this information depends critically on metadata. For example, the column in a spreadsheet or a data-file that whose contents are “M” or “F” might refer to the sex of the respondent to the questionnaire, but it is not, in general possible to determine that this is the meaning of this data without also having the corresponding questionnaire at hand and a human-being to make the connection.
Yet, compared to bibliographic metadata, social-science metadata is a poor second-cousin. In the worst case, a data-file is a meaningless sequence of numbers whose significance is entirely opaque to anyone but the social scientist who created it. In the best case the raw data is annotated in some markup language like XML using a standardized DTD for social-science metadata.
Unfortunately, there aren’t many software tools for social scientists to create, annotate and analyse their data in a commonly accepted standard. The Council of European Social Science Data Archives and the The Inter-university Consortium for Political and Social Research – have developed a metadata standard called DDI (Data Documentation Initiative) and there are some software systems (such as Nesstar) for storing analyzing and meta-tagging data.
The latest version of the DDI standard (version 3), due for ratification in July 2007, has created meta-data tags to take into account the full life-cycle of creation and publication of social science data, including tags for conventional bibliographic meta-data (author / subject / publication date etc.) So the future for electronic social science looks promising. But it’s a pity that there wasn’t more forethought in the creation of meta-data standards for social science in the 1970s or 80s. We might know more about ourselves as a society than it appears we currently do.