jump to navigation

DeepDyve’s iTunes Business Model October 27, 2009

Posted by Andre Vellino in Digital library, Information retrieval, Search.
add a comment

deepdyveDeepDyve appears to have adopted an iTunes-like buisness model ….$0.99 rentals for scientific research articles!

I like many things about the search engine – the way one can enter entire paragraphs of text as a query block, for instance.  You can use that feature in PLoSONE (Public Library of Science), though it is not obviously available on the DeepDyve site itself.

I don’t think that knowledge can be commodified in this way, though. It doesn’t look like it is going to be sustainable business model.

Nothing is “Miscellaneous” October 17, 2009

Posted by Andre Vellino in Classification, Collaborative filtering, Information retrieval, Recommender service.
2 comments

everything-is-miscellaneousI think I now understand why David Weinberger’s book “Everything is Miscellaneous” is so provocative and sometimes enraging.  It often sounds like he’s claiming that there is no point at all in classifing / categorizing information.  No matter what you do, you’re going to get the category “wrong” because there is no such thing as a “right” category. Ergo, don’t even try – everything belongs in the category “Misc”.

I think Weinberger’s emperor has no clothes – in fact, he is asserting that nothing is “Miscellanous”. Everything belongs to some category for someone, it’s just that it may not be the same category for everyone. A banana is likely to be a fruit for most people, but also a weapon for John Cleese.  The point is: a banana is always a kind of something in every context.

So isn’t there is a middle ground between banishing the Dewey decimal system (or indeed any other library classification system) and dumping every digital object into an undifferentiated pile.  Indeed, there’s a lot to be said for a thoroughly well-understood standard, albeit a dated and even a bad, system of classification: at the very least, it is predictable.  If you know how the meta-data was generated (e.g. call-number, subject category, keywords), for a given item, you’ll be better able to retrieve it.

Furthermore, I expect there are some unforseen problems with the democratization of knowledge generated by social tagging and recommender systems.  Who’s doing the tagging?  Who’s doing the bookmarking? High school students?

This is of particular concern to me in the context of scholarly articles. Are the numbers of co-downloads in a digital library primarily due to professors’ undergraduate course syllabi?  Would professors’ syllabi be influenced by scholarly recommender systems?  I expect that the recommender-effect studied in Daniel Fleder’s “Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity” and which shows that recommenders decrease aggregate diversity would be an especially accute problem when sources of co-download behaviour are (relatively) few (e.g. professors’ course syllabi).

Conclusion? I think it matters what population you are drawing from for your metadata – be it social tagging or collaborative filtering recommendations.  There is a point in relying on experts and big thinkers.  They are more knowledgeable and credible than even the collective intelligence of the masses.

CiteUlike Recommender September 28, 2009

Posted by Andre Vellino in Collaborative filtering, Recommender, Recommender service.
2 comments

cite-u-likeThe recommender system that Toine Bogers experimented on a few years ago with CiteUlike data and which is the subject of a very interesting poster given at Recomender Systems 2008 is now on-line at CiteUlike.

Paradoxically, my personal CiteUlike library of (only) 22 articles (mostly on recommender systems) isn’t sufficient to generate any recommendations. Probably there aren’t enough people who have similar collections.

Logicomix September 28, 2009

Posted by Andre Vellino in Logic.
add a comment

logicomixPersonally, I am glad for book reviews at the New York Times.  Without them, it would no doubt have taken me much longer to discover LogicoMix, the comic book version of “the epic story of the quest for the Foundations of Mathematics”.

The concept and story comes from Christos H. Papadimitriou who teaches theoretical computer science at UC Berkely and wrote the erudite text book “Elements of the Theory of Computation”.

One anonymous reviewer of LogicoMix on Amazon says:

…the book tries to be too many things at once, and succeeds as none of them. It is neither a strong introduction to Russell’s ideas, nor a worthwhile biography in condensed form, nor a successful piece of historical comic art. It’s a pleasant enough read, but considering its ambition ultimately a disappointing one.

So… my expectations from the NYTimes (mostly) positive review are dampened somewhat.  Still – how often does this golden age of logic get the attention of the graphic novelist?  Maybe they scooped Art Spiegelman.

Information Rich, Attention Poor September 25, 2009

Posted by Andre Vellino in Attention, Information.
add a comment

globe-imageIn his recent Essay in the Globe and Mail, Peter Nicholson makes some interesting observations about the relationship between information and attention.

He observes that thanks to computing technology and crowdsourcing, there is an abundance of low-cost information and that the scarce resource now is attention (whose correlate resource is our time). We waste information because it’s free and we favour superficial coverage versus depth of thought.

We may think metaphorically of the production of knowledge as a function of “information” and “attention,” with attention understood as the set of activities by which information is ultimately transformed into various forms of knowledge.

I don’t think I’ve heard this non-cognitive definition of attention before: that which transforms information into knowledge.  It doesn’t quite work as a definition but attention is no doubt one of the ingredients required for this transformation. Intelligence is probably helpful as well.

Nicholson also observes that knowledge has changed recently from “stock” to “flow”. Knowledge-as-”thing”, an object to be to be accumulated and stored (stock), belongs to 20th century libraries.  21st century knowledge is more a “process” that changes and is updated all the time (flow).

There are two reasons for this, Nicholson says. One is that electronic information “permit[s] it to be changed continuously and almost at no cost.” The other is the “shift of intellectual authority from producers of depth – the traditional “expert” – to the broader public.” The result of this shift

….is the growing disintermediation of experts and gatekeepers of virtually all kinds.

This increasing “disintermediation” means we no longer need to think deeply for ourselves – we can rely on the wisdom of the crowds for just-in-time consumption (viz. Wikipedia).

This can’t last, he argues.  The buck (the deep thinking and attending) has to be done somewhere by someone.

What is apparently being eroded is the deep, integrative mode of knowledge generation that can come only from the “10,000 hours” of individual intellectual focus.

Maggie Jackson in her book “Distracted“ goes further and suggests that our individual and collective inability to focus threatens the very fabric of civil society.  Our inability to pay attention makes us unable to distinguish between the trivial and the important.

A recent article entitled “Cognitive control in media multitaskers” published in the Proceedings of the National Academy of Sciences provides some evidence to support this.  In the abstract, the authors say:

….heavy media multitaskers performed worse on a test of task-switching ability, likely due to reduced ability to filter out interference from the irrelevant task set.

Which is consistent with the view that multitaskers are less able to discriminate between tasks that are important and those that are trivial.

Maggie Jackson isn’t alone in being concerned about our collective attention deficit disorder. Ottawa’s Heather Menzies concurs in “No Time“.

I was re-reading 1995 (i.e. pre commercial web) Unte Reader article the other day in which Pico Iyre was quoted as saying:

I worry about the relentless acceleration of the world, the dramatic shortening of our attention spans and the temptation [...] to value information before knowledge and knowledge before wisdom.

Trustworthy Knowledge September 14, 2009

Posted by Andre Vellino in Epistemology.
add a comment

wikipediaA few months ago the irish sociology student Shane Fitzgerald perpetrated a Wikipedia hoax that led to a mis-attributed quote by the composer Maurice Jarre in the Guardian’s obiturary‘ about him.  This led pundits to reflect on what counts as an authoritative trustworthy sources of knowledge.  They concluded that

  • Wikipedia isn’t authoritative;
  • journalists are lazy and don’t check their facts and
  • If such simple mis-attributions can be printed in a Guardian obituary, what information sources can we trust?

These observations, however true they may be,  miss an important point – the need for authoritative sources depends how likely or unlikely any bit of information is a priori. If I told you that Aung San Suu Kyi has decided to support the Burmese Military Junta, you would have every right to consider me a crackpot and demand that I document my claim, given what we know about her political history. That Maurice Jarre might have said (as was asserted in the Wikipedia hoax)

“My life has been one long soundtrack. Music was my life, music brought me to life.”

is not entirely unbelievable (unless, perhaps, you knew Jarre personally and were quite sure it wasn’t in his character to say such a thing.)  How important should it have been for the journalist to substantiate the attribution of this quote? Not as important, I submit, as if the journalist had reported that Jarre had spent the first 5 years of his infancy being raised by wolves in Siberia – well, perhaps it would have been important to get that right, given the low probability that this is true.

In practice, academic peer-reviewing also depends on the a priori probability of a paper’s claim. Despite the methdologically sound call for repeatable experiments, documented procedures, public dataset etc.  there just aren’t enough hours in the day to comb through a paper’s claim in detail.  Unless the claim is unlikely to be true, given what we already know.

The now 20 year old Cold Fusion debacle is a good example of that.  Given the extraordinary claim that nuclear fusion can happen at room temperatures, it’s obviously critical that the experiment that demonstrates this phenomenon be both repeatable and repeated.  But I think it’s unreasonable to expect the same level of scrutiny to hold for what Thomas Kuhn called “Normal Science”.