jump to navigation

Building a Better Citation Index March 20, 2012

Posted by Andre Vellino in Citation, Data, Open Source.

Scholars in a variety of disciplines (not just bibliometrics!) have been building better measures of scholarly output.  First came the H-index in 2005 followed by the G-index in 2006, and these are now part of the standard measures for scholarly output.

However, as Daniel Lemire points out in his latest blog post, the raw data of mere citations is pretty crude.  In any given article, it’s often hard to tell which of the (typically) dozens of references are “en passant” (to fend off the critics who might think you haven’t read the literature) or incidental to the substance of the article. What’s interesting for the authors of the articles being cited is the question “how citical is this citation to the author who cited me”?

One way to find out (and hence, perhaps, to build a better citation measure) is to train a Machine Learning algorithm to extract “key citations” – by analogy with extracting “key phrases” from a text (see Peter Turney’s 2000 article Machine Learning Algorithms for Keyphrase Extraction). As a starting point, we’d like to compile data from researchers which asks the question: “What are the key references of your papers?”

It will take 10 minute: please fill  this Google-documents questionaire. In it we ask you, as the author of an article, to tell us which 1, 2, 3 or 4 references are essential to that article. By an essential reference, we mean a reference that was highly influential or inspirational for the core ideas in your paper; that is, a reference that inspired or strongly influenced your new algorithm, your experimental design, or your choice of a research problem.

When this survey is completed, we will be releasing the resulting data set under the ODC Public Domain Dedication and Licence so that you can use this data in other ways, if you wish.


1. From counting citations to measuring usage (help needed!) - March 20, 2012

[…] reading: Building a Better Citation Index by Andre Vellino Tweet Comments […]

2. gawp - April 17, 2012

This will be a useful data set. Will try to fill this out for some of my papers.

I’ve often wondered if it was possible to identify the “polarity” of a reference; papers that are cited as part of a critique or refutation. Those are “negative” references and it would be interesting to try to identify them, as they have a different meaning. Maybe sentiment analysis around the reference point in the text…?

3. The HIP-index: A Better Measure of Research Impact | Synthèse - November 16, 2013

[…] months ago, Xiaodan Zhu, Peter Turney, Daniel Lemire and I embarked on an experiment to see if we could identify the features in an article that would enable us to identify the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: