CYC Game November 17, 2007Posted by Andre Vellino in Knowledge Representation, Logic, Semantics.
In the “neat vs. scruffy” debate in AI, my dedication to the “neat” camp is wavering. Granted that logic is interesting and useful, but is it really the right formalism for knowledge representation?
Take the CYC project, for example. It is tempting to believe, following Witgenstein’s Tractatus that “the world is the totality of facts” and “the facts in logical space are the world”. And the CYC project has been driven by this temptation: OpenCYC now contains about 300,000 “concepts”, 3,000,000 “assertions” and 26,000 relations between them and an inference engine with which to draw conclusions.
If you believe in helping CYC to learn, you can play this collaborative game to help CYC learn more facts about the world. The game composes (seemingly random) questions about relations that might be meaningful or true or false and you get to tell it whether these generated propositions are true or not.
Tools are typically found in jewelry store facility (sic – absence of “a” article between “in” and “jewelry”)
True or false? Well, that’s a good question isn’t it? What does CYC mean by “tools”? Garden tools? Woodworking tools? or Watch repair tools? Apparently 36% of game-players think this proposition is false. But it’s true, under at least one sensible interpretation of “tools”, right?
Every internal combustion-powered motor vehicle has exactly one gas cap.
Only 41% of respondents thought that was true. Well, of course, this proposition isn’t necessarily true. Some vehicles may have two gas-tanks, but generally it is true. Do we really need to encode universally quantified assertions of any kind? I suppose the reason would be to save space and to use universal instantiation to deduce new facts from general rules. But do we in fact reason in Socratic syllogisms?
Consider this assertion from the CYC game:
Most BTR70 armored personnel carriers are wider than most BDRM-2s.
I have no idea, of course. But how many possible facts of that kind are there? Suppose there are even only 4 relations being considered between objects that take up volume in space: “wider”, “taller”, “heavier”, “more fragile than”. And suppose there are 100,000 objects worth considering under those relations. That’s about 20 million facts right there.
Another assertion was:
A feeling of courage is unlikely to be accompanied by a feeling of initiative.
100% of respondents (except me) thought this was false! Really? What about the (presumably) courageous world-war one soldiers who blindly followed orders to their certain deaths – were they showing initiative?
Or how about:
The act of Irish step dancing expresses enjoyment.
which CYC believes to be true. Well, I’m sure many Irish step dancers enjoy what they do and I doubt there’s much Irish dancing at funerals, but is this statement really true?
These completely nonsensical ones were pretty funny:
Pages are typically located in homes.
Alwayses are typically located in school building k through 12.
People typically perform or are involved in retirement more frequently than they perform or are involved in confusing an opponent.
One has to wonder: does a machine really “know” anything about the component terms in these assertions if it needs to ask about them?
What about transient “facts”, assertions whose truth depends on conditions in time? For instance:
Islam is a major religion in the united kingdom.
Most government ministers are taller than most spokespersons.
which CYC also believes to be true. Well I don’t know, but if these statements aren’t true now, it may well be one of these days and even if they are true now, this may not always be true. So in what sense are these assertions “facts”?