jump to navigation

Is Everything Miscellaneous? June 22, 2007

Posted by Andre Vellino in Information retrieval.
trackback

David Weinberger is promoting his book Everything is Miscellaneous and there are videos both of the talk he gave at Google and the talk he gave at Yahoo. The Google talk is more “polished” in some ways (PPT charts etc.) but the Yahoo talk is more like a fire-side chat, which I prefer.

His argument, in a nutshell, is that society, media, authorities, libraries etc. have placed constraints (i.e. limitations) on how we organize things (information in particular) by imposing an (arbitrary) order (e.g. library catalogs, newspaper sections, TV schedules) controlled by “experts”. Now, however, the new-world order, empowered by collaborative tools and common spaces on the web (Flickr etc.) is democratic – we can (individually and collectively) define our own order on things (e.g. by tagging things via folksonomies.) Nothing belongs to fixed categories anymore everything should be classified under “miscellaneous”.

I can think of several counter-arguments. One is the observation that there are “natural kinds“. Trees are trees, dogs are dogs and Aristotle was right about (some aspects of) taxonomies. There is a way in which the world is – it isn’t just what we want it to be.

Secondly, I like to defer to authorities (sometimes) – like lexicographers and experts who know more than I do. When I want to learn about something I look for an authority that I can trust has done a lot of deep thinking that I haven’t had the time to do.

Thirdly, predictable sources of information – business sections of the newspaper, slashdot and the register for all things geeky and even old fashioned dictionaries – they all have real value because they give us conventions and (some) predictability.

As any linguist will tell you the meanings of words in a language change over time. Language is a dynamic thing. But it also depends on convention. That the phrase “that’s really sick” means the same thing as what in my generation was “that’s really cool” requires a community of users to agree, by convention, about the meanings of words, and dictionaries reflect those conventions.

There’s real value in having understood and commonly held conceptual structures – mathematics being the best of them – with which to define relationships between observables and create explanatory models of how the world might be. Not every way of “slicing and dicing the world” is on par – Einstein’s theory of gravitation is better than Newton’s.

Furthermore, there is no theory-free data. It’s virtually useless to just dump the text 12 million scientific articles into an index and say that you have a science library. Even Google’s full-text index is imbued with some “world-knowledge” – if only through Page Rank or TF-IDF.

Take for example the impressive Seadragon demo that reverse-engineers a 3-D model of Notre Dame from photographs. If everything is miscellaneous, then we have no theories about the world or the way the world is. These photographs don’t reconstruct clouds, or an abstract human body (there are lots of both in the original stills), but Notre Dame. Why? Because Notre Dame was the “object of interest” and they were looking for Notre Dame elements in each photo.

We always have an “object of interest” (or a question we want to answer) whether we do science experiments or type a query in a search engine. We come to data and to the world with preconceptions about how it might be. I think there’s some merit in sharing those preconceptions!

In a way, I think that’s Weinberger’s point: we are more free now, in the new world order, to create these shared world-views rather than have them imposed upon us. So I applaud the fact that the Gilbert library has dropped the Dewey decimal system for organizing it’s books – Free at Last!…. but free only from one tyranny, only to be replaced by another (in this case subject classification).

My point is – we’re always going to be bound by some way of organizing the world, and if we want to be understood, it needs to be one that we share in common with others. Personally, I don’t enjoy the tyranny of folksonomies – I don’t understand them (well) and I prefer a natural (librarian) or artificial (AI) classifier. Even if some of the categories are wrong – wildly wrong, even – at least we get some consistency.

Comments»

1. Daniel Lemire - June 25, 2007

I think you are doing the “all must be black” and “all must be white” dance.

My point of view is that systematically, we overengineer systems in a top-down way. I never said that expert knowledge was not useful. It just just not the oracle we think it is. People who design systems usually overassume. They also put themselves way above their users and want to gain control through the new tools.

Amazon is a good example of this. I do not want to price, the year of publication or the official book summary to be tinkered with. However, I find “people who browsed this book did this next” insanely more useful than old school expert annotations.

Google is also a good example. I do not want people to edit the title of the page, or to rank pages manually… but I sure want Google to take input from web sites through the link structure. I also think that Google will benefit from capturing user behavior.

It does not need to be fancy. “Users who clicked on this page always also visited this other page”. Sort of.

Here’s what you should be comparing:

1) We take all of the documents in a library and some experts classify them in a taxonomies, preventing users to ever annotate these documents. In effect, these experts claim to be oracles. This is pretty much how old school libraries work, and they are really, really obselete now. I used to rage againts how hard it was to find a book, now I know why: the librarians made it difficult. I had to *think* like them… but my point of view is systematically different.

2) We take all documents in a library, we fully text index them, we process automatically citations and stuff, and allow users to annotate and tag. We allow user to share their best references. And we create wikis where users can comment books.

Notice that I do not give *all* power to users. You still need a benevolent dictator.

There are still people in wikipedia who have the root passwords. But they are not all powerful. If they start bitching us, someone will grab a copy and create a new wikipedia.

Essentially, I call for more humility.

There is not God-given taxonomy. Taxonomies are a form a dimensionality reduction our brain uses, that is all.

A tree is a tree? Oh yes? Is a planet a planet? What about Pluto? Are you assuming that experts all agree in their classifications?

> we’re always going to be bound by some way of organizing the world, and if we want to be
> understood, it needs to be one that we share in common with others

Sure, but probably less than we think at first glance. Millions of people have learned an approximate English and they communicate very efficiently. I learned English a little bit. I very rarely speak it. I make tons of mistakes in English. And yet, I claim, it does not matter.

Yes, you need some common ground. You need some common vocabularies. Yes, you need some consistency. Yes, 1+1 = 2.

But to achieve all of this, you do not need to have all powerful enforcers. I think you will find that communities will kick out people who say that 1+1=3.

>Even if some of the categories are wrong – wildly wrong, even – at least we
> get some consistency.

Ok. That’s the precision versus recall game. Google Scholar has poor precision. It is filled with (serious) errors. Yet, it is an order of magnitude more useful that the best tools CISTI has to offer.

Don’t believe me? Then run a user study.

I think you will find that you overestimate the need for accuracy. Accuracy and precision matter, but less than we think.

2. Andre Vellino - June 25, 2007

Thanks for your comments Daniel. I (mostly) agree with you (and Weinberger) – I just needed to voice a few counterpoints.

It’s true that some “natural kinds” (like ‘planet’) can be deceiving. My main point is that Democracy (on the Internet, anyway) isn’t (always) what it’s cracked up to be. I *do* trust experts (like MDs) and specialists in information retrieval and Statistical Semantics to show me the way. So I also agree with you about benevolent dictatorships.

And of course I believe in the usefulness of usage-data and CF to add to / complement other, static data (e.g. sometimes-not-so-useful metadata). What people do with information and how they behave is useful (even more in a digital science library than on Amazon.)

But what you and Peter do is much more valuable (to me, anyway) than what Brittany Spears does :-).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: