Freebase Revisited July 5, 2007Posted by Andre Vellino in Data Mining, Information retrieval, Open Access.
A few months ago, I noted the arrival of Freebase the “Creative Commons Database”. Since Peter is fond of analogies, I would say “Web Pages are to Wikis as Databases are to Freebase”. After a few months of waiting, I got my username / password today and got a chance to have a closer look at Freebase.
The idea is growing on me. I think Freebase strikes a nice balance between “Everything is Miscellaneous” (anarchy) and “Do It My Way” (totalitarianism). The creators of databases in this creative commons can choose and control the data-schemas but, if others don’t like it, they have the freedom to create their own data-schemas. So you have a certain anarchy – whomever can do whatever (constrained by some organizational principles) – but you end up with information that is addressable (every data element in the entire commons is guaranteed to have a unique key, for instance) and (somewhat) structured.
It isn’t anarchy because Freebase imposes some organizational principles on every database. There are these concepts of “Domains”, “Types” and “Topics”. Using Freebase’s own database, a search for a definition of “Topic”, “Domain” and “Type” yields:
A domain represents a set of related types, and also serves as a namespace for those types. For access control a domain object refers to one or more usergroup objects that administer the domain. Only members of the specified usergroups are allowed to add or edit types within the domain.
A type is a category of being. A human is a type of thing; a cloud is a type of thing (entity); and so on. A particular instance of a type is called a token of that thing; so Socrates was a token of a human being, but is not any longer since he is dead. Likewise, the capital A in this sentence is a token of the first letter of the Latin alphabet.
[A] topic is one of the core types in Freebase. Topics contain a set of default properties that are generally useful when describing a topic: display name, alias, article, image and webpage.
So these principles have both pragmatic and “ontological” reasons for being – ways of organizing information that provide access-control on the structure of the database schema but also some method for annotating data so that it is easy, for example, to disambiguate search results.
Freebase also provide developer tools. They devised their own (dare I say LISP-ish?) query language called MQL and a Java Script / HTML templating engine called MJT that allows you to manipulate JSON data (e.g. from a web service) in the browser.
In short Freebase gives you a place – Wikipedia style – for the community to define what used to be called “world knowledge” in a (relatively) free but systematic way and a set of tools to build applications on top of this.
I think good things will come of this.