Building a Better Citation Index March 20, 2012Posted by Andre Vellino in Citation, Data, Open Source.
Scholars in a variety of disciplines (not just bibliometrics!) have been building better measures of scholarly output. First came the H-index in 2005 followed by the G-index in 2006, and these are now part of the standard measures for scholarly output.
However, as Daniel Lemire points out in his latest blog post, the raw data of mere citations is pretty crude. In any given article, it’s often hard to tell which of the (typically) dozens of references are “en passant” (to fend off the critics who might think you haven’t read the literature) or incidental to the substance of the article. What’s interesting for the authors of the articles being cited is the question “how citical is this citation to the author who cited me”?
One way to find out (and hence, perhaps, to build a better citation measure) is to train a Machine Learning algorithm to extract “key citations” – by analogy with extracting “key phrases” from a text (see Peter Turney’s 2000 article Machine Learning Algorithms for Keyphrase Extraction). As a starting point, we’d like to compile data from researchers which asks the question: “What are the key references of your papers?”
It will take 10 minute: please fill this Google-documents questionaire. In it we ask you, as the author of an article, to tell us which 1, 2, 3 or 4 references are essential to that article. By an essential reference, we mean a reference that was highly influential or inspirational for the core ideas in your paper; that is, a reference that inspired or strongly influenced your new algorithm, your experimental design, or your choice of a research problem.
When this survey is completed, we will be releasing the resulting data set under the ODC Public Domain Dedication and Licence so that you can use this data in other ways, if you wish.