<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Synthèse</title>
	<atom:link href="http://synthese.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://synthese.wordpress.com</link>
	<description>&#34;Opération qui procède du simple au composé, de l&#039;élément au tout.&#34;</description>
	<lastBuildDate>Thu, 12 Jan 2012 06:43:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='synthese.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Synthèse</title>
		<link>http://synthese.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://synthese.wordpress.com/osd.xml" title="Synthèse" />
	<atom:link rel='hub' href='http://synthese.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Review: &#8220;Mahout in Action&#8221;</title>
		<link>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/</link>
		<comments>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 15:23:46 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Book Review]]></category>
		<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Recommender service]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=714</guid>
		<description><![CDATA[In early September 2010 (I&#8217;m embarassed to count many months ago that was!) I received an Early Access (PDF) copy of &#8220;Mahout in Action&#8221; (MIA) from Manning Publications and asked to write a review. There have been 4 major updates to the book (now no longer &#8220;early access&#8221;!) since then and although it is too [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=714&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div>
<p><img class="alignleft" src="http://www.manning.com/owen/owen_cover150.jpg" alt="" width="150" height="188" />In early September 2010 (I&#8217;m embarassed to count many months ago <em>that</em> was!) I received an Early Access (PDF) copy of &#8220;<a href="http://www.manning.com/owen/">Mahout in Action</a>&#8221; (MIA) from Manning Publications and asked to write a review. There have been 4 major updates to the book (now no longer &#8220;early access&#8221;!) since then and although it is too late to fulfill their purpose in giving me an early access to review (no doubt a supportive quote for the dust jacket or web site), I thought I&#8217;d nevertheless post my belated notes.</p>
<p><a href="http://lucene.apache.org/mahout/">Mahout is an Apache project</a> that develops scalable machine learning libraries for recommendation, clustering and classification. Like many other such software-documentation &#8221;in Action&#8221; books for Apache projects (Lucene / Hadoop / Hibernate / Ajax, etc.), the primary purpose of MIA is to complement the existing software documentation with both an explanatory guide for how to use these libraries and some practical examples of how they would be deployed.</p>
<p>First I want to ask: &#8220;how does one go about reviewing such a book&#8221;? Is it possible to dissassociate one&#8217;s opinion about the book itself from one&#8217;s opinion of the software? If the software is missing an important algorithm, does this impugn the book in any way?</p>
<p>The answers to these questions are, I think, &#8220;yes&#8221; and &#8220;no&#8221; respectively. Hence, the following comments assess the book on its own merits and in relation to the software that it documents, not in relation to the machine learning literature at large. Indeed, the fact that this book is not a textbook on or an authoritative source for machine learning is made quite explicit at the beginning of the book and the authors make no claim at being experts in the field of Machine Learning.</p>
<p>It&#8217;s important to understand that Mahout came about in part as a refactoring excercise in the <a href="http://lucene.apache.org/">Apache Lucene</a> project, since several modules in Lucene use information retrieval techniques such as vector based models for document semantics (see the survey paper by Peter Turney and Patrick Pantel &#8220;<a href="http://www.jair.org/media/2934/live-2934-4846-jair.pdf">From Frequency to Meaning: Vector Space Models of Semantics</a>&#8220;). The amalgamation of those modules with the open source collaborative filtering system (formerly called <em>Taste</em>) by co-author Sean Owen yielded the foundation for Mahout.</p>
<p>Thus, if  there are gaps in Mahout software it is an accident of history more than a design flaw.  Like most software &#8211; especially open-source software &#8211; Mahout is still &#8220;under construction&#8221;, as evidenced by its current version number (&#8220;0.5&#8243;). Even though many element are quite mature there are also several missing elements and whatever lacunae there are should be considered as an opportunity to contribute and improve this library rather than to criticize it.</p>
<p>One obvious source for comparison is <a href="http://www.cs.waikato.ac.nz/ml/weka/">Weka</a> &#8211; also an open-source machine learning library in Java. The book associated with this library &#8211; <a href="http://www.cs.waikato.ac.nz/~ml/weka/book.html">Data Mining: Practical Machine Learning Tools and Techniques</a> (Second Edition) by Ian H. Witten, Eibe Frank &#8211; was published in 2005 and has a much more pedagogical purpose than Mahout in Action. In contrast with MIA, &#8220;Data Mining&#8221; is much more of an academic book, published by academic researchers, whose purpose is to teach readers about Machine Learning.  In that way, these two books are complimentary, particularly as there are no algorithms devoted to recommendations in Weka and many more varieties of classification and clustering algorithms in Weka than in Mahout.</p>
<p>The Mahout algorithms that are discussed in MIA include the following.</p>
<ul>
<li>Collaborative Filtering</li>
<li>User and Item based recommenders</li>
<li>K-Means, Fuzzy K-Means clustering</li>
<li>Mean Shift clustering</li>
<li>Dirichlet process clustering</li>
<li>Latent Dirichlet Allocation</li>
<li>Singular value decomposition</li>
<li>Parallel Frequent Pattern mining</li>
<li>Complementary Naive Bayes classifier</li>
<li>Random forest decision tree based classifier</li>
</ul>
<p>The integration of Mahout with Apache&#8217;s implementation of MapReduce &#8211; <a href="http://hadoop.apache.org/">Hadoop </a>- is no doubt the unique characteristic of this software. If you want to use a distributed computing platform to implement these kinds of algorithms, Mahout and MAI is the place to start.</p>
<p>On its own terms, then, how does the book fare? It is fair to say &#8211; for the quotable extract &#8211; that Mahout in Action is an indispensible guide to Mahout! I wish I had had this book 5 years ago when I was getting to grips with open source collaborative filtering recommenders!</p>
<p>P.S. This book fits clearly in the business model for open source Apache software &#8211; write great and useful software for free, but make the users pay for the documentation!  Which is only fair, I think, since $20 or so is not much at all for such a wealth of well-written software! The same can be said for Weka, whose 303 pages of software documentation still requires the book to be useful.</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/714/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=714&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.manning.com/owen/owen_cover150.jpg" medium="image" />
	</item>
		<item>
		<title>CISTI Sciverse Gadget App</title>
		<link>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/</link>
		<comments>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 18:06:26 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[CISTI]]></category>
		<category><![CDATA[Digital library]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=849</guid>
		<description><![CDATA[Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed a (beta) Sciverse gadget that gives searchers and researchers a window on CISTI&#8217;s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI&#8217;s search results from a database of over 20 million [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=849&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.applications.sciverse.com/action/appDetail/298702?zone=main&amp;pageOrigin=appGallery&amp;activity=display"><img class="alignleft  wp-image-850" title="SearchAtCISTI" src="http://synthese.files.wordpress.com/2011/12/searchatcisti.jpg?w=221&#038;h=76" alt="" width="221" height="76" /></a> Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed <a href="http://www.applications.sciverse.com/action/appDetail/298702?zone=main&amp;pageOrigin=appGallery&amp;activity=display">a (beta) Sciverse gadget</a> that gives searchers and researchers a window on CISTI&#8217;s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI&#8217;s search results from a database of over 20 million journal articles.</p>
<p><img class="alignleft  wp-image-852" title="SciverseApps" src="http://synthese.files.wordpress.com/2011/12/sciverseapps1.jpg?w=174&#038;h=82" alt="" width="174" height="82" /></p>
<p>Next year, I plan follow up with another Sciverse gadget for my <a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/">citation-based recommender</a> that uses the full power of<a href="http://developers.sciverse.com/api"> Elsevier&#8217;s API into its collection content</a>.</p>
<p>I want to commend all and sundry at Sciverse Applications for this initiative.  Opening up bibligraphic data and providing developers with a developer platform (a customized version of <a href="http://docs.opensocial.org/display/OS/Home">Google&#8217;s OpenSocial platform</a>) is exactly the right kind of thing to do both to benefit third parties (they get access to anotherwise closed and proprietary data) and to enhance their own search and discover environment.</p>
<p>There are, already, several advanced and interesting applications on Sciverse. My favourites are: <a href="http://www.applications.sciverse.com/action/appDetail/297955?zone=main&amp;pageOrigin=home&amp;activity=display">Altmetric</a> (winner of the Science Challenge prize &#8211; see YouTube demo video below) NextBio&#8217;s <a href="http://www.applications.sciverse.com/action/appDetail/292667?zone=main&amp;pageOrigin=appGallery&amp;activity=display">Prolific Authors</a> and Elsevier&#8217;s <a href="http://www.applications.sciverse.com/action/appDetail/292651?zone=main&amp;pageOrigin=appGallery&amp;activity=display">Table Download</a>.</p>
<span style="text-align:center; display: block;"><a href="http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/"><img src="http://img.youtube.com/vi/zhtuBsQCLMw/2.jpg" alt="" /></a></span>
<p>And there will be more to come. An open marketplace like this where the principles of variation and natural selection can operate will, I predict, make for a richer diversity of useful search and discovery tools than any single organization can develop on its own.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/849/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=849&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/12/searchatcisti.jpg" medium="image">
			<media:title type="html">SearchAtCISTI</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/12/sciverseapps1.jpg" medium="image">
			<media:title type="html">SciverseApps</media:title>
		</media:content>
	</item>
		<item>
		<title>What is &#8216;Data&#8217;?</title>
		<link>http://synthese.wordpress.com/2011/06/14/what-is-data/</link>
		<comments>http://synthese.wordpress.com/2011/06/14/what-is-data/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 02:59:43 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Information retrieval]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=820</guid>
		<description><![CDATA[&#8220;What does &#8216;data&#8217; mean to you?&#8221; I asked innocently to various participants at JCDL 2011 today.  I had just come out of a very interesting panel discussion entitled &#8220;Big Data, Big Deal?&#8221; at which most of the discussion was about large amounts of proprietary text at http://www.hathitrust.org/ (some of of the discussion was also about large [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=820&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://images3.wikia.nocookie.net/__cb20061127074519/memoryalpha/en/images/1/13/Data%2C_2364.jpg" alt="" width="117" height="143" />&#8220;What does &#8216;data&#8217; mean to you?&#8221; I asked innocently to various participants at <a href="http://www.jcdl2011.org/">JCDL 2011</a> today.  I had just come out of a very interesting panel discussion entitled &#8220;Big Data, Big Deal?&#8221; at which most of the discussion was about large amounts of proprietary text at <a href="http://www.hathitrust.org/">http://www.hathitrust.org/</a> (some of of the discussion was also about large amounts of music in the <a href="http://salami.music.mcgill.ca/">SALAMI project at McGill</a>).</p>
<p>Now I am very interested in text, text retrieval (and music IR too) and I found the panel discussion most rewarding.  But it wasn&#8217;t <em>about</em>what I had been expecting it to be about (from the title) and I was perplexed by this use of the term &#8220;data&#8221; in this context. After all, the subtitle of the JCDL 2011 conference is &#8220;Bringing Together Scholars, Scholarship and Research Data&#8221;.  So the context for &#8220;data&#8221; was (for me) &#8220;research data&#8221; in the sense of the term that is pretty much the same the first 3 sentences of the <a href="http://en.wikipedia.org/wiki/Data">Wikipedia entry for Data</a>:</p>
<blockquote><p>The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data (plural of &#8220;datum&#8221;) are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which information and then knowledge are derived.</p></blockquote>
<p>So I was somewhat taken aback by the argument that ensued. Everyone, it seems (except me), was quite happy to speak of &#8220;Big data&#8221; and &#8220;large amounts of text&#8221; as synonymous.  As though the streams of bytes that are common to readings from an NMR spectrometer, digital music and electronic journal articles were in all significant respects indistinguishable.</p>
<p>Of course, large volumes of byte-sequences share some kinds of problems like storage, preservation and search. But &#8220;text data&#8221; is a different kind of beast, isn&#8217;t it? For one thing, text typically has meaning &#8211; cognitive content that is different from, say, music or images or spreadsheets of temperature variations in Glasgow over the past 500 years. It has more structure too, as evidenced by how efficiently it compresses and how (relatively) easy it is to search.</p>
<p>I&#8217;m happy to speak of data <em>about</em> text that is inferred by the act of mining text.  Word frequencies, ngrams, term clusters, sentiment categories etc. fit the definition of &#8220;data&#8221; above. Even the textual &#8220;meta-data&#8221; about text is data of a certain kind. But the text itself just doesn&#8217;t seem to be that kind of thing (qualitative or quantitative attributes of a variable).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/820/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/820/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=820&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/06/14/what-is-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://images3.wikia.nocookie.net/__cb20061127074519/memoryalpha/en/images/1/13/Data%2C_2364.jpg" medium="image" />
	</item>
		<item>
		<title>Learning from Watson</title>
		<link>http://synthese.wordpress.com/2011/02/19/learning-from-watson/</link>
		<comments>http://synthese.wordpress.com/2011/02/19/learning-from-watson/#comments</comments>
		<pubDate>Sat, 19 Feb 2011 19:57:05 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[Statistical Semantics]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=793</guid>
		<description><![CDATA[Now that Watson has convincingly demonstrated that machines can perform some natural language tasks more effectively than humans can (see a rerun of part of Day 1 of the Jeopardy contest), what is the proper conclusion to be drawn from it? Should we join hands with &#8220;confederates&#8221; like Brian Christian and rally against the invasion [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=793&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg"><img class="alignleft size-medium wp-image-794" title="IBM_Watson" src="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg?w=162&#038;h=162" alt="Watson" width="162" height="162" /></a>Now that <a href="http://www.ibm.com/innovation/us/watson/">Watson</a> has convincingly demonstrated that machines can perform some natural language tasks more effectively than humans can (see a <a href="http://www.youtube.com/watch?v=4PSPvHcLnN0">rerun of part of Day 1</a> of the Jeopardy contest), what is the proper conclusion to be drawn from it?</p>
<p>Should we join hands with &#8220;confederates&#8221; like Brian Christian and rally against the invasion of smart machines? (See his recent piece in the <a href="http://www.theatlantic.com/magazine/archive/2011/03/mind-vs-machine/8386/">Atlantic</a> and listen to his recent <a href="http://www.cbc.ca/day6/blog/2011/02/18/interview-forget-watson-this-is-the-real-test-of-ai/">radio interview on CBC</a>)?</p>
<p>Or do we conclude that machines are now (or soon will be) sentient and deserve to be spoken to with respect for their moral standing (see Peter Singer&#8217;s article &#8220;<a href="http://www.project-syndicate.org/commentary/psinger57/English">Rights for Robots</a>&#8220;)? Or should we, like<a href="http://www.nserc-crsng.gc.ca/Prizes-Prix/Herzberg-Herzberg/Profiles-Profils/Hinton-Hinton_eng.asp"> NSERC Gold Medal Award</a> winner <a href="http://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoffrey Hinton</a>,  be scared about the social consequences (in the long term) of intelligent robots designed replace soldiers (listen to his interview on the future of <a href="http://en.wikipedia.org/wiki/Artificial_intelligence">AI machines</a> on <a href="http://www.cbc.ca/video/news/audioplayer.html?clipid=1803608455">CBC&#8217;s Quirk and Quarks</a>).</p>
<p>Before coming to any definite conclusion about how &#8220;like&#8221; us machines can be, I think we should consider how these machines do what they do.  The <a href="http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf">survey paper in AI Magazine</a> about the design of &#8220;DeepQA&#8221; by the Watson team gives some indications of the general approach:</p>
<blockquote><p>DeepQA is a massively parallel, probabilistic evidence-based architecture. For the Jeopardy Challenge, we use more than 100 different techniques for analyzing natural language, identifying sources, ﬁnding and generating hypotheses, ﬁnding and scoring evidence, and merging and ranking hypotheses&#8230;.</p>
<p>The overarching principles in DeepQA are <em>massive parallelism</em>, <em>many experts</em>, <em>pervasive conﬁ-dence estimation</em>, and <em>integration of shallow and deep knowledge</em>.</p></blockquote>
<p>Is this the right model for creating artificial cognition? Probably not. As Maarten van Emden and I argue in a recent paper on the <a href="http://web.ncf.ca/andre/publications/ChineseRoomHumanWindow.pdf">chinese room argument and the &#8220;Human Window&#8221;</a>, the question of whether a computer is simulating cognition cannot be decided by how effectively a computer solves a chess puzzle (for instance) but rather by the mechanism that it uses to achieve the end.</p>
<p>In this instance DeepQA uses and combines a number of different techniques from NLP, machine learning, distributed processing and decision theory &#8211; which is not likely to be an accurate representation of what humans actually do but it is undeniably successful at that task (see<a href="http://www.youtube.com/watch?v=v5CPGMZteFQ&amp;feature=player_embedded"> this talk on YouTube</a> about how IBM addressed the Jeopardy problem).</p>
<p>Geoff Hinton (in the radio interview mentioned above) speculates that Watson is a feat of special-purpose engineering but that the general-purpose solution &#8211; a large neural network that simulates the learning abilities of the brain &#8211; is what the project of AI is really about.</p>
<p>What we suggest in our Human Window paper is that one criterion we can use to determine whether machines are performing adequate simulations of what humans do is whether or not humans are able to follow the steps that machine is undertaking. On that criterion, I think it&#8217;s safe to say that Watson &#8211; although very impressive &#8211; isn&#8217;t quite there yet.</p>
<p>P.S. If you have the patience, I recommend watching a <a href="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov">BBC debate</a> from 1973 between <a href="http://en.wikipedia.org/wiki/James_Lighthill">Sir James Lighthill</a>, <a href="http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)">John McCarthy</a> and <a href="http://en.wikipedia.org/wiki/Donald_Michie">Donald Michie</a> about whether AI is possible. The context of this video is the &#8220;Lighthill Affair&#8221; in 1972, recently <a href="http://vanemden.wordpress.com/2011/02/18/from-the-chronicles-of-scruffy-versus-neat-the-lighthill-affair/">chronicled on van Emden&#8217;s blog</a> (note that the audio on this thumbnail video is rather out of synch!).</p>
<p>It&#8217;s amazing how spectacularly wrong an amateur in artificial intelligence (Prof. Lighthill was an applied mathematician specializing in fluid dynamics) can be about the possibiliy of machines simulating intelligent behaviour. It is real tragedy that Sir Lighthill&#8217;s ideological biases had such disastrous consequences for AI research funding in the UK. The attitude of Sir Lighthill reminds me of <a href="http://en.wikipedia.org/wiki/Samuel_Wilberforce">Samuel Wilberforce</a>&#8216;s objections  to Darwin&#8217;s theory of evolution. I find it astonishing that this BBC debate was so civilized in its demeanour.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/793/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/793/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=793&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/02/19/learning-from-watson/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
<enclosure url="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov" length="169265179" type="video/quicktime" />
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg?w=300" medium="image">
			<media:title type="html">IBM_Watson</media:title>
		</media:content>
	</item>
		<item>
		<title>Mendeley Data vs. Netflix Data</title>
		<link>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/</link>
		<comments>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 01:05:30 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Citation]]></category>
		<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Digital library]]></category>
		<category><![CDATA[Recommender]]></category>
		<category><![CDATA[Recommender service]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=766</guid>
		<description><![CDATA[Mendeley, the on-line reference management software and social networking site for science researchers has generously offered up a reference dataset with which developers and researchers can conduct experiments on recommender systems. This release of data is their reply to the DataTel Challenge put forth at the 2010 ACM Recommender System Conference in Barcelona. The paper published by computer scientists at Mendeley, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=766&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mendeley.com"><img class=" alignleft" src="http://www.mendeley.com/graphics/commonnew/logo-mendeley_1284377719.png" alt="" width="345" height="81" /></a></p>
<p><a href="http://www.mendeley.com/">Mendeley</a>, the on-line reference management software and social networking site for science researchers has generously offered up a <a href="http://dev.mendeley.com/datachallenge/">reference dataset</a> with which developers and researchers can conduct experiments on recommender systems. This release of data is their reply to the <a href="http://adenu.ia.uned.es/workshops/recsystel2010/datatel.htm">DataTel Challenge</a> put forth at the 2010 ACM Recommender System Conference in Barcelona.</p>
<p>The paper published by computer scientists at Mendeley, which accompanies the dataset (<a href="http://www.mendeley.com/research/sei-whale/">bibliographic reference</a> and <a href="http://www.mendeley.com/download/public/19900/3568420111/713e027c0c0b195d08f87da30f65bd668a3784a1/dl.pdf">full PDF</a>), describes the dataset as containing boolean ratings (read / unread or starred / unstarred) for about 50,000 (anonymized) users and references to about 4.8M articles (also anonymized), 3.6M of which are unique.</p>
<p>I was gratified to note that this is almost exactly the user-item ratio (1:100) that I  indicated in my <a href="http://goo.gl/Bc64">poster at ASIS&amp;T2010</a> was typically the cause of the data sparsity problem for  recommenders in digital libraries. If we measure the sparseness of a dataset by the number of edges in the bipartite user-item graph divided by the total number of possible edges, Mendeley gives 2.66E-05.  Compared with the sparsity of Neflix &#8211; 1.18E-02 &#8211; that&#8217;s a difference of 3 orders of magnitude!</p>
<p>But raw sparsity is not all that matters. The number of users per movie is much more evenly distributed in Netflix than the number of readers per article in Mendeley, i.e.  the user-item graph in Netflix is more connected (in the sense that the probability of creating a disconnected graph by deleting a random edge is much lower).</p>
<p>In the Mendeley data, out of the 3,652286 unique articles, 3,055546 (83.6%) were referenced by only 1 user and 378,114 were referenced by only 2 users. Less than 6% of the articles referenced were referenced by 3 or more users. [The most frequently referenced article was referenced 19,450 times!]﻿</p>
<p style="text-align:center;"><a href="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg"><img class="size-full wp-image-772  aligncenter" title="Mendeley-Articles" src="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg?w=460&#038;h=262" alt="" width="460" height="262" /></a></p>
<p>Compared with the Netflix dataset (which contains over ~100M ratings from ~480K users on ~17k  titles) over 89% of the movies in the Netflix data had been rated by 20 or more users. (See <a href="http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/">this blog post</a> for more aggregate statistics on Netflix data.)</p>
<p style="text-align:center;"><a href="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg"><img class="alignnone size-full wp-image-789" title="Netflix-movies1" src="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg?w=460&#038;h=243" alt="" width="460" height="243" /></a></p>
<p>I think that user or item similarity measures aren&#8217;t going to work well with the kind of distribution we find in Mendeley data. Some additional information such as article citation data or some content attribute such as the categories to which the articles belong is going to be needed to get any kind of reasonable accuracy from a recommender system.</p>
<p>Or, it could be that some method like the heat-dissipation technique introduced by physicists in the paper &#8220;<a href="http://doc.rero.ch/lm.php?url=1000,43,2,20100318115452-QH/zha_sad.pdf">Solving the apparent diversity-accuracydilemma of recommender systems</a>&#8221; published in the Proceedings of the National Academy of Sciences (PNAS) could work on such a sparse and loosely connected dataset. The authors claim that this approach works especially well for sparse bipartite graphs (with no ratings information). We&#8217;ll have to try and see.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/766/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/766/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=766&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.mendeley.com/graphics/commonnew/logo-mendeley_1284377719.png" medium="image" />

		<media:content url="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg" medium="image">
			<media:title type="html">Mendeley-Articles</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg" medium="image">
			<media:title type="html">Netflix-movies1</media:title>
		</media:content>
	</item>
		<item>
		<title>Ex Libris &#8216;bX&#8217; Recommender Promo Video</title>
		<link>http://synthese.wordpress.com/2010/10/05/ex-libris-bx-promo-video/</link>
		<comments>http://synthese.wordpress.com/2010/10/05/ex-libris-bx-promo-video/#comments</comments>
		<pubDate>Tue, 05 Oct 2010 18:33:26 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Recommender]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=747</guid>
		<description><![CDATA[I stumbled across this Ex Libris promo video for its &#8216;bX&#8217; recommender yesterday. Having done quite a few of these use-case demo scenarios to &#8220;show the value&#8221;, I appreciate how hard it is to pitch a relatively complex idea in straight-forward terms. I think it does a pretty good job too, notwithstanding the slightly over-the-top-happiness [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=747&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2010/10/bx-logo.jpg"><img class="alignleft size-full wp-image-755" title="bx-logo" src="http://synthese.files.wordpress.com/2010/10/bx-logo.jpg?w=460" alt=""   /></a>I stumbled across this Ex Libris promo video for its &#8216;bX&#8217; recommender yesterday. Having done quite a few of these use-case demo scenarios to &#8220;show the value&#8221;, I appreciate how hard it is to pitch a relatively complex idea in straight-forward terms. I think it does a pretty good job too, notwithstanding the slightly over-the-top-happiness tenor of the whole thing.</p>
<span style="text-align:center; display: block;"><a href="http://synthese.wordpress.com/2010/10/05/ex-libris-bx-promo-video/"><img src="http://img.youtube.com/vi/YvrPhATtGvY/2.jpg" alt="" /></a></span>
<p>At the risk of repeating myself, though, there&#8217;s one thing that the video glosses over.  <a href="http://en.wikipedia.org/wiki/SFX_(software)">SFX</a> logs are, effectively, click-logs and clicks have two sources: search engine results and &#8216;bX&#8217; recommendations themselves.  Hence &#8216;bX&#8217; recommendations are more likely to be &#8220;semantically homogenous&#8221; (although less so than pure search results) because the data they derive from is biased by search-engine ranking.  The proportion of SFX trafic that is generated by the recommender itself further narrows the semantic diversity of recommendations.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/747/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/747/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/747/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=747&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/10/05/ex-libris-bx-promo-video/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/10/bx-logo.jpg" medium="image">
			<media:title type="html">bx-logo</media:title>
		</media:content>
	</item>
		<item>
		<title>The Cost (vs. Value) of Data Curation</title>
		<link>http://synthese.wordpress.com/2010/10/02/the-cost-of-data-curation/</link>
		<comments>http://synthese.wordpress.com/2010/10/02/the-cost-of-data-curation/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 16:45:16 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=726</guid>
		<description><![CDATA[There is a tension between the cost (to the curator) of data-curation and the potential value (to others) of making data (e.g. data from scientific experiments) available. For the purposes of selection, it would be nice to know ahead of time whether the data you wish to make available (now) is ever going to have [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=726&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://www.dlib.org/dlib/july04/beagrie/fig-2.gif" alt="" width="254" height="176" />There is a tension between the cost (to the curator) of data-curation and the potential value (to others) of making data (e.g. data from scientific experiments) available. For the purposes of selection, it would be nice to know ahead of time whether the data you wish to make available (now) is ever going to have value (in the future).</p>
<p>Unfortunately, you can&#8217;t predict that ahead of time because (i) you don’t know who your data-users might turn out to be or (ii) how the circumstances might change that make what was previously an irrelevant-seeming piece of data into an planet-saving one.</p>
<p>Indeed it&#8217;s impossible to know how <em>any </em>element of data might be used for any given purpose and by whom. For instance consider whether the Nuclear Magnetic Resonance spectra that you have collected for the purpose of <a href="http://article.pubs.nrc-cnrc.gc.ca/ppv/RPViewDoc?issn=1480-3291&amp;volume=80&amp;issue=8&amp;startPage=949">analyzing the structure and composition a pathogen</a> might not be fruitfully reused in the future for the purpose of understanding the bias of an improperly calibrated instrument or indeed (for technology historians of the future) how the (what may then be &#8220;primitive&#8221;) NMR spectroscopy technology was used in the 20th and early 21st century.</p>
<p>So, how are we to interpret Weinberger’s advice in “<a href="http://www.everythingismiscellaneous.com/">Everything is Miscellaneous</a>”?:</p>
<ul>
<li>“The solution to overabundance of information is more information”</li>
<li>“Filter on the way out, not on the way in”</li>
<li>“Put each leaf on as many branches as possible”</li>
<li>“Everything is metadata and can be a label”</li>
<li>“Give up control”</li>
<li>“A ‘topic’ is anything someone somewhere is interested in.”</li>
</ul>
<p>The cash value of this advice for data: publish as much data in as you can; give users as many ways as you can to let them get at it (e.g. APIs but also user-interfaces); give users as many ways as you can to add more data (tags, metadata, text, links to other data &#8211; viz. &#8220;<a href="http://linkeddata.org/">linked data</a>&#8220;).</p>
<p>Which is fine advice if you assume that publishing data, like putting text and images on the internet is (almost) free. But publishing data isn&#8217;t (yet) close to free.  Why? Because it (still) needs to be curated by someone who understands how to annotate it in at least the obvious ways in which it may be useful &#8211; e.g. to other contemporary scientists.</p>
<p>Prediction: either scientist will have to be trained to become data-curators <strong><em>or</em></strong> the process of creating data will have to generate the metadata <em><strong>or</strong></em> e-librarians will have to have train in the sciences (inclusive sense of &#8220;or&#8221;).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/726/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=726&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/10/02/the-cost-of-data-curation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.dlib.org/dlib/july04/beagrie/fig-2.gif" medium="image" />
	</item>
		<item>
		<title>Are User-Based Recommenders Biased by Search Engine Ranking?</title>
		<link>http://synthese.wordpress.com/2010/09/28/are-user-based-recommenders-biased-by-search-engine-ranking/</link>
		<comments>http://synthese.wordpress.com/2010/09/28/are-user-based-recommenders-biased-by-search-engine-ranking/#comments</comments>
		<pubDate>Tue, 28 Sep 2010 14:00:54 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Recommender]]></category>
		<category><![CDATA[Recommender service]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Semantics]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=729</guid>
		<description><![CDATA[I have a hypothesis (first emitted here) that I would like to test with data from query logs: user-based recommenders &#8211; such as the &#8216;bX&#8217; recommender for journal articles &#8211; are biased by search-engine language models and ranking algorithms. Let&#8217;s say you are looking for &#8220;multiple sclerosis&#8221; and you enter those terms as a search [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=729&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://blsciblogs.baruch.cuny.edu/kmoriah/files/2010/03/grave-injustice-school-choice-south-carolina.jpg" alt="" width="243" height="194" />I have a hypothesis (first emitted <a href="http://web.ncf.ca/an386/publications/ASIST2010-vellino-poster.pdf">here</a>) that I would like to test with data from query logs: user-based recommenders &#8211; such as the <a href="http://www.exlibrisgroup.com/category/bXOverview">&#8216;bX&#8217; recommender for journal articles</a> &#8211; are biased by search-engine language models and ranking algorithms.</p>
<p>Let&#8217;s say you are looking for &#8220;multiple sclerosis&#8221; and you enter those terms as a search query. Some of the articles that were presented to you from the search results will likely be relevant and you download a few of the articles during your session. This may be followed by another, semantically germane query that yeilds more article downloads. As a consequence, the usage-log (e.g. the SFX log used by &#8216;bX&#8217;) is going to register these articles as having been &#8220;co-downloaded&#8221;.  Which is natural enough.</p>
<p>But if this happens a lot, then a collaborative filtering recommender is going to generate recommendations that are biased by the ranking algorithm and language model that produced the search-result ranking: even by PageRank, if you&#8217;re using Google.</p>
<p>In contrast, a citation-based (i.e. author-centric) recommender (such as <a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/">Sarkanto</a>) will likely yield more semantically diverse recommendations because co-citations will have (we hope!) originated from deeper semantic relations (i.e. non-obvious but meaningful connections between the items cited in the bibliography).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/729/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=729&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/09/28/are-user-based-recommenders-biased-by-search-engine-ranking/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://blsciblogs.baruch.cuny.edu/kmoriah/files/2010/03/grave-injustice-school-choice-south-carolina.jpg" medium="image" />
	</item>
		<item>
		<title>Scientific Data is Interpreted</title>
		<link>http://synthese.wordpress.com/2010/09/26/scientific-data-is-interpreted/</link>
		<comments>http://synthese.wordpress.com/2010/09/26/scientific-data-is-interpreted/#comments</comments>
		<pubDate>Sun, 26 Sep 2010 17:03:23 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Epistemology]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=707</guid>
		<description><![CDATA[It must be a truism by now that there is no such thing as theory-free observation. Scientific data is necessarily tied up with the theories that are required to interpret them and which led to their discovery. By analogy I would argue that scientific data sets are useless unless they are interpreted. There is no such [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=707&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://www.faqs.org/photo-dict/photofiles/list/3311/4401running_hourglass.jpg" alt="" width="142" height="212" />It must be a truism by now that there is <a href="http://plato.stanford.edu/entries/science-theory-observation/">no such thing as theory-free observation</a>. Scientific data is necessarily tied up with the theories that are required to interpret them and which led to their discovery.</p>
<p>By analogy I would argue that scientific data sets are useless unless they are interpreted. There is no such thing as a useful &#8220;raw&#8221; data set.</p>
<p>Consider for instance the <a href="http://time5.nrc.ca/timefreq/IERS.html">data on leap seconds from the National Research Council</a>. It&#8217;s a simple enough table: there are three columns (Date, UTC Leap Seconds, and MJD) and only a few dozen rows. Here are two such rows:</p>
<pre style="text-align:center;"><span style="text-decoration:underline;">         DATE     UTC Leap Seconds     MJD      </span>
2006-01-01 - 2009-01-01   33     53 736 - 54 832
1999-01-01 - 2006-01-01   32     51 179 - 53 736</pre>
<p>The first question for the uninitiated in the measurement of  time: what is a &#8220;UTC Leap Second&#8221;? It&#8217;s easy enough to <a href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time">look up</a> and learn that UTC is</p>
<blockquote><p>is a time standard based on <a href="http://en.wikipedia.org/wiki/International_Atomic_Time">International Atomic Time</a> (TAI) with leap seconds added at irregular intervals to compensate for the Earth&#8217;s slowing rotation.</p></blockquote>
<p>Ah, so this was news to me: the earth&#8217;s rotation is slowing down! (&#8220;the solar day becomes 1.7 ms longer every century due mainly to tidal friction (2.3 ms/cy, reduced by 0.6 ms/cy due to glacial rebound&#8221;).</p>
<p>The (implicit) frame of reference for (exact) time with respect to which the earth is slowing down is the <a href="http://tycho.usno.navy.mil/cesium.html">atomic (cesium) clock</a>, which requires an understanding of the highly theoretical processes of quantum mechanics to interpret correctly.</p>
<p>So now we have an inkling of what the data means. They give us the variance between two time measurements &#8211; those from atomic clocks and those from the earth&#8217;s rotation. A first attempt at interpreting the first row in the table is:  it took 3 years between 2006-01-01 and 2009-01-01 to add one leap-second to the calendar date.</p>
<p>A little &#8220;Binging&#8221; (I&#8217;ve all but abandoned &#8220;Googling&#8221; since Google became &#8220;instant&#8221; &#8211; not because it can&#8217;t be turned off, but to make a statement to Google) yields &#8220;<a href="http://tycho.usno.navy.mil/mjd.html">Modified Julian Day</a>&#8221; for MJD.  So the third column is primarily a conversion of the first column into a standard, though not without its own theoretical reasons for being the preferred measure.</p>
<p>All this to say &#8211; repositories of datasets without (substantial) amounts of textual metadata, not to mention software and tools designed for its interpretation and navigation are going to be (at best) not very useful.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/707/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/707/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/707/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=707&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/09/26/scientific-data-is-interpreted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.faqs.org/photo-dict/photofiles/list/3311/4401running_hourglass.jpg" medium="image" />
	</item>
		<item>
		<title>Sarkanto Scientific Search</title>
		<link>http://synthese.wordpress.com/2010/09/13/sarkanto-scientific-search/</link>
		<comments>http://synthese.wordpress.com/2010/09/13/sarkanto-scientific-search/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 14:26:21 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Digital library]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Recommender]]></category>
		<category><![CDATA[Recommender service]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=685</guid>
		<description><![CDATA[A few weeks ago I finished deploying a version of a collaborative recommender system that uses only article citations as a basis for recommending journal articles.  This tool allows you to search ~ 7 million STM (Scientific Technical and Medical) articles up to Dec. 2009 and to compare citation-base recommendations (using the Synthese recommender) with [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=685&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;"><a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/"><img class="aligncenter size-full wp-image-700" title="sarkanto-search" src="http://synthese.files.wordpress.com/2010/09/sarkanto-search.jpg?w=460&#038;h=129" alt="" width="460" height="129" /></a>A few weeks ago I finished deploying a version of a collaborative recommender system that uses only article citations as a basis for recommending journal articles.  This tool allows you to search ~ 7 million STM (Scientific Technical and Medical) articles up to Dec. 2009 and to compare citation-base recommendations (using the Synthese recommender) with <a href="http://www.exlibrisgroup.com/category/bXOverview">recommendations generated by &#8216;bX&#8217;</a> (a user-based collaborative recommender from Ex Libris).  You can <a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/">try the Sarkanto demo</a> and <a href="http://web.ncf.ca/an386/publications/ASIST2010-vellino-poster.pdf">read more about how &#8216;bX&#8217; and Sarkanto compare</a>.</p>
<p>Note that I&#8217;m also using this implementation to experiment with <a href="http://code.google.com/p/google-api-translate-java/">Google Translate API</a> and the <a href="http://www.microsofttranslator.com/">Microsoft Translator</a> to do both query expansion into the other Canadian Official Language and to translate various bibliographic fields upon returning search results.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/685/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/685/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/685/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=685&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/09/13/sarkanto-scientific-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/09/sarkanto-search.jpg" medium="image">
			<media:title type="html">sarkanto-search</media:title>
		</media:content>
	</item>
		<item>
		<title>Feedback Effects in Google Instant</title>
		<link>http://synthese.wordpress.com/2010/09/08/feedback-effects-in-google-instant/</link>
		<comments>http://synthese.wordpress.com/2010/09/08/feedback-effects-in-google-instant/#comments</comments>
		<pubDate>Thu, 09 Sep 2010 01:58:02 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=687</guid>
		<description><![CDATA[I haven&#8217;t experimented with Google Instant long enough to tell if I will like it over the long run, but it certainly is an extraordinary feat of engineering!  This new feature &#8211; which uses the &#8220;Google Suggest&#8221; auto-completion feature and Ajax to give you &#8220;instant&#8221; search results based on just the first few characters of your [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=687&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://www.devicemag.com/wp-content/uploads/2010/08/google-instant-search-feature-update-485x363.jpg" alt="" width="262" height="196" />I haven&#8217;t experimented with <a href="http://www.google.com/instant/#utm_campaign=launch&amp;utm_medium=van&amp;utm_source=instant">Google Instant</a> long enough to tell if I will like it over the long run, but it certainly is an extraordinary feat of engineering!  This new feature &#8211; which uses the &#8220;<a href="http://www.google.com/support/websearch/bin/answer.py?hl=en&amp;answer=106230">Google Suggest</a>&#8221; auto-completion feature and Ajax to give you &#8220;instant&#8221; search results based on just the first few characters of your search query &#8211; imposes a dramatic load increase on Google servers. Yet clever engineering feats in caching and efficient query optimization have produced the desired scalability results and it is impressive to use.</p>
<p>(BTW &#8211; If you want to try &#8220;Google Instant&#8221; and you are in a country that doesn&#8217;t have it yet try &#8220;/ncr&#8221; (no country redirect), i.e.  &#8221;<a href="http://www.google.com/ncr">http://www.google.com/ncr</a>&#8220;)</p>
<p>One effect that is sure to manifest over time is a feedback loop that &#8220;Google Instant&#8221; will have on &#8220;Google Suggest&#8221;. Just as people (mostly) currently click on one of the top-10 search results, so I expect most users will increasingly search for what Google suggests rather than their own terms and expressions, thus narrowing the range of options that &#8220;Suggest&#8221; can offer users over time.</p>
<p>One (interesting) issue is going to be: does &#8220;Instant&#8221; degrade the quality of &#8220;Suggest&#8221;.  i.e. the more people use &#8220;Instant&#8221; the more the &#8220;top-N&#8221; suggested terms are reinforced, thus thinning out the &#8220;long tail&#8221; of queries.  Is &#8220;Instant&#8221; going to increasingly cater to the lowest common denominator?</p>
<p>The demos given at the <a href="http://www.youtube.com/watch?v=i0eMHRxlJ2c">Google Instant launch by Google executives</a> showed off how just typing &#8220;w&#8221; results in an instant and prescient result for &#8220;The Weather Network&#8221; (which, surprise, is what that demo scenario has you wanting!) I thought it might be interesting find out what Google Instant produces with each of the 26 letter of the alphabet.  Here are the results:</p>
<blockquote>
<div>A: Amazon.com: Online Shopping for Electronics &#8230;<br />
B: Best Buy: TVs, Digital Cameras  &#8230;<br />
C: craigslist: los angeles classifieds for jobs &#8230;<br />
D: Dictionary.com | Find the Meanings &#8230;<br />
E: eBay &#8211; New &amp; used electronics, cars,  &#8230;<br />
F: Welcome to Facebook<br />
G: Gmail: Email from Google<br />
H: Windows Live Hotmail<br />
I: Welcome to IKEA.com<br />
J: JetBlue | Airline Tickets, Flights, and Airfare<br />
K: Kohl&#8217;s<br />
L: Lowe&#8217;s Home Improvement: Appliances, Tools&#8230;</div>
<div>M: MapQuest Maps &#8211; Driving Directions &#8211; Map<br />
N: Netflix &#8211; TV &amp; movies instantly streamed online  &#8230;<br />
O: Orbitz Travel: Airline Tickets, Cheap Hotels &#8230;<br />
P: Pandora Radio &#8211; Listen to Free Internet Radio &#8230;<br />
Q: Famous Quotes and Quotations at BrainyQuote<br />
R: REI &#8211; Outdoor Gear, Equipment  &#8230;<br />
S: Sears: Appliances, Tools, Electronics &#8230;<br />
T: Target.com &#8211; Furniture, Baby, Toys &#8230;<br />
U: USPS &#8211; The United States Postal Service &#8230;<br />
V: Verizon | Broadband (DSL) Internet Service &#8230;<br />
W: Current Weather &#8211; The Weather Network<br />
X: Xbox.com | Home<br />
Y: Yahoo!<br />
Z: Zillow &#8211; Real Estate, Homes for Sale &#8230;.</div>
</blockquote>
<p>&#8220;Suggest&#8221; results are clearly dominated by big on-line businesses: Sears, Verizon, Microsoft, Facebook, Amazon, eBay&#8230;. Is that really what most Google users search for most of the time? If so, I despair for the democratic internet.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/687/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=687&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/09/08/feedback-effects-in-google-instant/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.devicemag.com/wp-content/uploads/2010/08/google-instant-search-feature-update-485x363.jpg" medium="image" />
	</item>
		<item>
		<title>Programming Language Seduction</title>
		<link>http://synthese.wordpress.com/2010/09/08/programming-language-seduction/</link>
		<comments>http://synthese.wordpress.com/2010/09/08/programming-language-seduction/#comments</comments>
		<pubDate>Wed, 08 Sep 2010 16:24:17 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Constraints]]></category>
		<category><![CDATA[Knowledge Representation]]></category>
		<category><![CDATA[Logic]]></category>
		<category><![CDATA[Logic Programming]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=663</guid>
		<description><![CDATA[Maarten van Emden has written a new piece, &#8220;The Fatal Choice&#8221; (companion to his earlier post &#8220;Who Killed Prolog&#8220;) which aims at explaining the fierce loyalty that devotees have to programming languages such as Lisp and Prolog. One can&#8217;t help but notice in this post, a slight undertow of resentment &#8211; as if Maarten had [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=663&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://www.math.hawaii.edu/LatThy/jb-lat.gif" alt="" width="273" height="162" />Maarten van Emden has written a new piece, &#8220;<a href="http://vanemden.wordpress.com/2010/08/31/the-fatal-choice/">The Fatal Choice</a>&#8221; (companion to his earlier post &#8220;<a href="http://vanemden.wordpress.com/2010/08/21/who-killed-prolog/">Who Killed Prolog</a>&#8220;) which aims at explaining the fierce loyalty that devotees have to programming languages such as Lisp and Prolog.</p>
<p>One can&#8217;t help but notice in this post, a slight undertow of resentment &#8211; as if Maarten had been jilted. To wit:</p>
<blockquote><p>The US had acted as transmitter of the Prolog bug without being itself infected.</p></blockquote>
<p>&#8230; as though Prolog were a European disease against which Americans (but not Canadians) were (thankfully) immune.</p>
<p>And again:</p>
<blockquote><p>The examples that follow are not useful, but are chosen to convey the seductive nature of Prolog.</p></blockquote>
<p>&#8230; as though she were an evil temptress whose beauty is merely skin deep!</p>
<p>As a lover (of Prolog) myself, I can definitely attest to its seductiveness.  I still love Prolog &#8211; for some of the reasons that the subsequent generation of Python  and Ruby affecionados do: the freedom from the limitations of strict typing, dynamic  interpretation, conciseness&#8230; Shall I compare thee to a summer&#8217;s day?</p>
<p>But what has smitten me for life is not really unification and backtracking &#8211;  it&#8217;s Constraint Programming.  There are lots of <a href="http://4c.ucc.ie/web/archive/solver.jsp">programming languages deisgned  for solving constraint satisfaction problems</a>. There are even <a href="http://www.solver.com/index.html">spreadsheet  plugins for solving CSPs</a>.  But what makes Logic Programming the perfect  paradigm for constraint programming &#8211; as do speadsheets, actually, and for the same reason &#8211; is the declarative nature of relations between constrained variables.</p>
<p>One of my favourite toy program that illustrates the declarative character of CSPs in a logic programming framework is the <a href="http://en.wikipedia.org/wiki/Eight_queens_puzzle">N-Queens problem</a>.  The program below can be run with the Finite Domain (FD) solver in <a href="http://www.gprolog.org/">GNU Prolog</a>.</p>
<pre><span style="font-size:small;">queens(N, L) :-
    length(L, N),
    domain(L, 1, N),
    safe(L),
    fd_labeling(L).
safe([]).
safe([X|Xs]) :-
    safe_between(X, Xs, 1),
    safe(Xs).
safe_between(X, [], M).
safe_between(X, [Y|Ys], M) :-
    no_attack(X, Y, M),
    M1 is M+1,
    safe_between(X, Ys, M1).
 no_attack(X, Y, N) :-
    X #\= Y,
    X+N #\= Y,
    X-N #\= Y.
 domain(L,S,E) :-
    values(S,E,V),
    fd_domain(L,V).
values(N,N,[N]):-!.
values(N,M,[M|R]):-
     L is M - 1,
     values(N,L,R).</span></pre>
<p>I like this program not because it is 66% shorter (in than the equivalent [backtracking and recursive]) <a href="http://www.cs.princeton.edu/introcs/23recursion/Queens.java.html">Java program</a> but because the constraints are laid out declaratively by inequality constraints (in &#8220;no_attack&#8221;) and the work is done by a constraint propagation engine whose conceptual model is a natural extension of <a href="http://en.wikipedia.org/wiki/Unification_(computing)">unification</a>.  In fact, constraint propagation on finite domains and on closed intervals on the real line have a semantic model given by the theory of <a href="http://en.wikipedia.org/wiki/Lattice_(order)">lattices</a>, which Bill Older and I wrote up in a <a href="http://web.ncf.ca/andre/publications/wclp4.pdf">paper  in 1994</a>.</p>
<p>What I find seductive is about constraint logic programming is the mathematical elegance of its underlying semantics and I haven&#8217;t yet fallen out of love.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/663/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/663/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/663/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=663&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/09/08/programming-language-seduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.math.hawaii.edu/LatThy/jb-lat.gif" medium="image" />
	</item>
		<item>
		<title>Scientific Research Data</title>
		<link>http://synthese.wordpress.com/2010/08/23/scientific-research-data/</link>
		<comments>http://synthese.wordpress.com/2010/08/23/scientific-research-data/#comments</comments>
		<pubDate>Mon, 23 Aug 2010 18:53:21 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Information]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=614</guid>
		<description><![CDATA[Scientific research data is, without a doubt, a central component in the lifecycle of knowledge production. For one thing, scientific data is critical to the corroboration (or falsification) of theories. Equally important to the process of scientific inquiry is making this data openly available to others &#8211; as is vividly demonstrated by the so-called &#8220;ClimateGate&#8221; controversy and the more [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=614&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2010/07/forward.jpg"><img class="alignleft size-medium wp-image-648" title="forward" src="http://synthese.files.wordpress.com/2010/07/forward.jpg?w=157&#038;h=300" alt="" width="157" height="300" /></a>Scientific research data is, without a doubt, a central component in the lifecycle of knowledge production. For one thing, scientific data is critical to the <a href="http://http/plato.stanford.edu/entries/popper/">corroboration (or falsification)</a> of theories. Equally important to the process of scientific inquiry is making this data openly available to others &#8211; as is vividly demonstrated by the so-called &#8220;<a href="http://en.wikipedia.org/wiki/Climatic_Research_Unit_e-mail_hacking_incident">ClimateGate</a>&#8221; controversy and the more recent cloud on <a href="http://www.nytimes.com/2010/08/12/education/12harvard.html?ref=science">Marc Houser&#8217;s research data</a> on primate cognition. The public accessibility of data enables open peer review and encourages the reproducibility of results.</p>
<p>Hence the importance of data management practices in 21st century science libraries: the curation of, access to and preservation of scientific research data set will be critical to the future of scientific discourse.</p>
<p>It is true that “Big Science” has been in the business of curating “reference data” for years. Institutional data centers in many disciplines have been gathering large amounts of data in databases that contain the fruit of years of research. <a href="http://www.ncbi.nlm.nih.gov/genbank/">GenBank</a>, for instance, is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (containing over 150,000 sequence records.)</p>
<p>However, other kinds of data gathered by scientists are either transient or highly context-dependant and are not being preserved for the long term benefit of future research either by individuals or by institutions. This might not be so serious for those data elements that are reproducible – either by experiment or simulation – but much of it, such as data on <a href="http://www.sciencemag.org/cgi/content/abstract/science.1195223">oil-content and dissipation rates in the Gulf of Mexico</a> water column in 2010, is uniquely valuable and irreproducible.</p>
<p>As I indicated in <a href="http://synthese.wordpress.com/2010/04/29/the-future-of-data-information-retrieval/">a previous post</a>, one development that will help redress the problems endured by small, orphaned and inaccessible dataset is the emergence of methods of uniquely referencing datasets such as the data <a href="http://www.doi.org/">DOI</a>s that are being implemented by <a href="http://www.datacite.org%29/">DataCite </a>partners.  The combination of data-deposit policies by science research funding agencies (such as <a href="http://www.nsf.gov/sbe/ses/common/archive.jsp">NSF</a> in the US and <a href="http://www.nserc-crsng.gc.ca/Professors-Professeurs/FinancialAdminGuide-GuideAdminFinancier/Responsibilities-Responsabilites_eng.asp">NSERC</a> in Canada) and peer-recognition from university faculty for contributions to data repositories, data publication and referencing will soon grow to match the present status of scholarly publications.</p>
<p>In parallel, the growing “<a href="http://en.wikipedia.org/wiki/Open_science_data">open access for Data</a>” movement and other initiatives to increase the availability of data generated by government and government-funded institutions (including <a href="http://gcmd.nasa.gov/">NASA</a>, the NIH and the <a href="http://data.worldbank.org/">World Bank</a> are now well underway in a manner consistent with the <a href="http://www.oecd.org/dataoecd/9/61/38500813.pdf">OECD’s principles</a>, which, incidentally, offers a long and convincing list of economic and social benefits to be obtained from making accessible scientific research data.</p>
<p>In particular, the <a href="http://www.data.gov/">United States</a> , the <a href="http://data.gov.uk/">UK </a>and <a href="http://ands.org.au/">Australia</a> are spearheading the effort of making public and scientific research data more accessible. For instance, in the U.S., the National Science and Technology Council (NSTC)’s recent <a href="http://www.nitrd.gov/About/Harnessing_Power_Web.pdf">report to President Obama</a> details a comprehensive strategy to promote the preservation of and access to digital scientific data.</p>
<p>These reports and initiatives show that the momentum is building globally to realize visions that have been articulated in principle by several bodies concerned with the curation and archiving of data in the first decade of the 21<sup>st</sup> century (see <a href="http://www.arl.org/bm~doc/digdatarpt.pdf">To Stand the Test of Time</a> and<a href="http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf"> Long Lived Scientific Data Collections</a>).</p>
<p>In Canada, several similar reports such as the <a href="http://data-donnees.gc.ca/docs/NCASRDReport.pdf">Consultation on Access to Scientific Research Data</a> and the <a href="http://www.collectionscanada.gc.ca/cdis/012033-1000.01-e.html">Canadian Digital Information Strategy</a> also point to the need for the national stewardship of digital information, not least scientific data sets. Despite much discussion, systematic efforts in the stewardship of Canadian digital scientific data sets are still only at the preliminary stages.  While there are well managed and curated reference data in domains such as earth science (<a href="http://geogratis.cgdi.gc.ca/geogratis/en/index.html">Geogratis</a>) and Astronomy (<a href="http://www3.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/cadc/">Canadian Astronomy Data Centre</a>) which have a community of specialist scientific users and whose needs are generally well met, the data-management needs of individual scientists in small, less well funded research groups is either impossible to find or lost.</p>
<p>One impediment to the effective bibliographic curation of data sets is the absence of common standards. There are currently “no rules about how to publish, present, cite or otherwise catalogue datasets.” [Green, T (2009), “<a href="http://dx.doi.org/10.1787/603233448430">We Need Publishing Standards for Datasets and Data Tables</a>”, OECD Publishing White Paper, OECD Publishing]</p>
<p>CISTI’s <a href="http://data-donnees.cisti-icist.nrc-cnrc.gc.ca/gsi/ctrl?lang=en">Gateway to Scientific Data</a> sets and other such national sites (e.g. the <a href="http://www.ndad.nationalarchives.gov.uk/">British National Archives of Datasets</a>) that aggregate information about data sets, use bibliographic standards (e.g. <a href="http://dublincore.org/">Dublin Core</a>) for representing meta-data.  The advantage is that these standards are not domain-dependant yet sufficiently rich to express the core elements of the content needed for archiving storage and retrieval.  However, these metadata standards, developed for traditional bibliographic purposes, are not (yet) sufficiently rich to fully capture the wealth of scientific data from all disciplines, as I argued in a <a href="http://synthese.wordpress.com/2010/05/07/data-archiving/">previous post</a>.</p>
<p>One of the major concerns when deciding on the feasibility of creating a data repository is the cost associated with the deposit, curation and long-term preservation of research data. Typically, costs depend on a variety of factors including how each of the typical phases (planning, acquisition, disposal, ingest, archive, storage, preservation and access services) are deployed (see the JISC reports “Keeping Research Data Safe” <a href="http://www.jisc.ac.uk/publications/reports/2008/keepingresearchdatasafe.aspx">Part 1</a> and <a href="http://www.jisc.ac.uk/publications/reports/2010/keepingresearchdatasafe2.aspx">Part 2</a>). The costs associated with different data collections are also likely to vary considerably according to how precious (rare/valuable) the stored information is and what the requirements are for access over time.</p>
<p>One point to note from the “Keeping research data safe” reports commissioned for JISC is that</p>
<blockquote><p>“the costs of archiving activities (archival storage and preservation planning and actions) are consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access.”</p></blockquote>
<p>In short &#8211; librarianship for datasets is critical to the future of science and technology costs are the least of our concerns.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/614/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/614/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/614/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=614&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/08/23/scientific-research-data/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/07/forward.jpg?w=157" medium="image">
			<media:title type="html">forward</media:title>
		</media:content>
	</item>
		<item>
		<title>Prolog&#8217;s Death</title>
		<link>http://synthese.wordpress.com/2010/08/21/prologs-death/</link>
		<comments>http://synthese.wordpress.com/2010/08/21/prologs-death/#comments</comments>
		<pubDate>Sat, 21 Aug 2010 20:47:56 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Logic]]></category>
		<category><![CDATA[Logic Programming]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=637</guid>
		<description><![CDATA[Maarten van Emden just posted a terrific and authoritative account of one episode in the history of Prolog under the title &#8220;Who Killed Prolog&#8221; (and, tantalizingly, promises another episode soon featuring my other super-heroic programming language, Lisp). According to van Emden, perhaps best known (by citation counts, anyway) as co-author (with Bob Kowalski) of the seminal 1976 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=637&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Prolog" src="http://groups.engin.umd.umich.edu/CIS/course.des/cis400/prolog/prolog.jpg" alt="" width="224" height="88" />Maarten van Emden just posted a terrific and authoritative account of one episode in the history of Prolog under the title &#8220;<a href="http://vanemden.wordpress.com/2010/08/21/who-killed-prolog/">Who Killed Prolog</a>&#8221; (and, tantalizingly, promises another episode soon featuring my other super-heroic programming language, Lisp).</p>
<p>According to van Emden, perhaps best known (by citation counts, anyway) as co-author (with Bob Kowalski) of the seminal 1976 JACM paper <a href="http://portal.acm.org/citation.cfm?id=321991">&#8220;The Semantics of Predicate Logic as a Programming Language</a>&#8220;, the culprit in this who-done-it is the boondogle <a href="http://www.sjsu.edu/faculty/watkins/5thgen.htm">Fifth-Generation Computer System</a> (FGCS) project.</p>
<p>van Emden&#8217;s historical account of what went wrong is completely correct, but I am not sure that this is all there is to it. I think there are (also?) technological and cognitive model issues with the language that are just as important to explaining its eventual demise.</p>
<p>I have had many opportunities to teach Prolog to programmers and by far the biggest cognitive problem that they have with this language is understanding what the interpreter is doing at any point in time. Prolog&#8217;s attempt at being declarative (I say &#8220;attempt&#8221; because I don&#8217;t think it succeed quite well enough) is the problem: how to get a computer to <em>do </em>something without telling it <em>what </em>to do?</p>
<p>The art of computer programming isn&#8217;t taught or practiced as the art of specifying a problem &#8211; it should be, perhaps, but it isn&#8217;t. Arguably, the <a href="http://en.wikipedia.org/wiki/Ada_(programming_language)">imperative programming paradigm</a> is a more natural fit with the <a href="http://www.csupomona.edu/~hnriley/www/VonN.html">von Neumann computer architecture</a> anyway; hence the popularity of strongly and statically typed imperative languages in which it is clear by inspection (or should be) what the machine is being instructed to do and on what data-objects these instructions should be performed.</p>
<p>The most confusing thing about Prolog is that, whatever algorithm you implement with it must be <em>on top of </em>the built-in ones, namely depth-first search, and unification (and only using recursion rather than iteration). Two things are always going on during the execution of a Prolog program: the traversal of a search space in which choice-points are introduced whenever multiple clauses match the current computational goal and a process of (possibly partial) variable instantiation (which may be undone when the the program traverses another branch at choice-points).</p>
<p>That this process of computation is difficult to grok is especially noticable when you try to debug a Prolog program. Computations get undone when attempts at satisfying a goal fail; other computations get retried down different branches resulting in different unifications and worse of all, the order in which you wrote your clauses in the program makes a difference to how it gets executed and, indeed, whether any part of the program is reachable.</p>
<p>I think this is just the kind of computer-generated complexity that, like multiple inheritance in Object Oriented languages, a programmer can really do without. For most programming tasks, except, perhaps, the kind found in computational linguistics, the fruits of these cognitive extravagances are not worth the expense.</p>
<p>So yes, the FGCS project was a boondoggle that contributed to Prolog&#8217;s death, but if Prolog had been easier to understand &#8211; perhaps with some stronger typing and some greater degree of declarativeness (such as can be found in some experimental descendants of Prolog such as <a href="http://www.scs.leeds.ac.uk/hill/GOEDEL/expgoedel.html">Goedel</a>) it might have survived.</p>
<p>Then again, perhaps not &#8211; <a href="http://en.wikipedia.org/wiki/Ada_(programming_language)">Ada</a>, after all, is pretty much dead too and it had none of these problems. Maybe it really is, as Maarten suggests, primarily a social phenomenon.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/637/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=637&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/08/21/prologs-death/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://groups.engin.umd.umich.edu/CIS/course.des/cis400/prolog/prolog.jpg" medium="image">
			<media:title type="html">Prolog</media:title>
		</media:content>
	</item>
		<item>
		<title>Springer Open (Access)</title>
		<link>http://synthese.wordpress.com/2010/06/29/springer-open-access/</link>
		<comments>http://synthese.wordpress.com/2010/06/29/springer-open-access/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 12:33:28 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=598</guid>
		<description><![CDATA[The science publisher Springer has announced that it has fully adopted the open access model for its on-line journals: Springer Open! Not only is that a progressive move, it&#8217;s an economic necessity. As academic libraries are cutting back on subscriptions to deal with budget cuts and publishers increase their subscription fees, the net result of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=598&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2010/06/springer.jpg"><img class="alignleft size-full wp-image-599" title="springer" src="http://synthese.files.wordpress.com/2010/06/springer.jpg?w=460" alt=""   /></a>The science publisher Springer has <a href="http://www.iwr.co.uk/academic-and-humanites/3010317/Springer-extends-open-access-to-cover-all-disciplines">announced</a> that it has fully adopted the open access model for its on-line journals: <a href="http://www.springeropen.com/">Springer Open</a>!</p>
<p>Not only is that a progressive move, it&#8217;s an economic necessity. As academic libraries are cutting back on subscriptions to deal with budget cuts and publishers increase their subscription fees, the net result of the traditional economic model can only spell disaster, as evidenced by the recent and public <a href="http://chronicle.com/article/U-of-California-Tries-Just/65823/">battle between the University of California and Nature Publishing Group</a>.</p>
<p>Making authors and researcher funders pay for academic publishing and giving away access to readers seems to be the only viable model left. I think it&#8217;s only a matter of time before other academic publishers follow suit.</p>
<p>I worry a little about independent researchers who don&#8217;t have the thousands of dollars in grant money that are going to be required to engage in the peer-reviewed publishing process. University budgets are being squeezed too and the Open Access model is going to add pressure on that part of the overall academic publishing  ecosystem.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/synthese.wordpress.com/598/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/synthese.wordpress.com/598/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/synthese.wordpress.com/598/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&amp;blog=666986&amp;post=598&amp;subd=synthese&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/06/29/springer-open-access/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/06/springer.jpg" medium="image">
			<media:title type="html">springer</media:title>
		</media:content>
	</item>
	</channel>
</rss>
