<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Synthèse</title>
	<atom:link href="http://synthese.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://synthese.wordpress.com</link>
	<description>&#34;Opération qui procède du simple au composé, de l&#039;élément au tout.&#34;</description>
	<lastBuildDate>Sat, 08 Jun 2013 12:55:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='synthese.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Synthèse</title>
		<link>http://synthese.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://synthese.wordpress.com/osd.xml" title="Synthèse" />
	<atom:link rel='hub' href='http://synthese.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Freedom Abhors a Chill</title>
		<link>http://synthese.wordpress.com/2013/03/24/freedom-abhors-a-chill/</link>
		<comments>http://synthese.wordpress.com/2013/03/24/freedom-abhors-a-chill/#comments</comments>
		<pubDate>Sun, 24 Mar 2013 18:06:58 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Ethics]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=1050</guid>
		<description><![CDATA[Jian Ghomeshi&#8217;s opening monolog on CBC&#8217;s radio program Q is the lastest salvo against the Library and Archives of Canada new Code of Conduct. In it he uses the phrase &#8220;Freedom Abhors a Chill&#8221;.  And a chill it is: The BC Library Association has condemned it in writing. BC Archivist Myron Groover was polite but firm on [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1050&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2013/03/screen-shot-2013-03-24-at-12-55-43-pm.png"><img class="alignleft  wp-image-1051" alt="Screen Shot 2013-03-24 at 12.55.43 PM" src="http://synthese.files.wordpress.com/2013/03/screen-shot-2013-03-24-at-12-55-43-pm.png?w=276&#038;h=93" width="276" height="93" /></a>Jian Ghomeshi&#8217;s <a href="http://www.cbc.ca/q/blog/2013/03/22/jians-opening-essay-on-library-and-archives-canada/">opening monolog on CBC&#8217;s radio program Q</a> is the lastest salvo against the Library and Archives of Canada new Code of Conduct. In it he uses the phrase &#8220;Freedom Abhors a Chill&#8221;.  And a chill it is:</p>
<iframe class="scribd_iframe_embed" src="http://www.scribd.com/embeds/130187655/content?start_page=1&view_mode=&access_key=key-14lbjy2m72sdgmvu2vxr" data-auto-height="true" scrolling="no" id="scribd_130187655" width="100%" height="500" frameborder="0"></iframe>
<div style="font-size:10px;text-align:center;width:100%"><a href="http://www.scribd.com/doc/130187655">View this document on Scribd</a></div>
<p>The <a href="http://bclainfopolicycommittee.wordpress.com/2013/03/20/bcla-press-release-on-lac-code-of-conduct/">BC Library Association has condemned it</a> in writing. BC Archivist Myron Groover was <a href="http://www.cbc.ca/player/AudioMobile/As+It+Happens/ID/2352464065/">polite but firm</a> on “As It Happens”.</p>
<p>Members of Parliament for the Official Opposition Andrew Cash and Pierre Nantel gave the Heritage Minister a piece of their mind about it in the Canadian House of Commons:</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='460' height='289' src='http://www.youtube.com/embed/BlEYlzwvJXg?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p>Jim Turk, Executive Director of the Canadian Association of University Teachers (CAUT) gave a clear explanation of what&#8217;s at stake in <a href="http://www.rcinet.ca/english/daily/interviews-2012/14-44_2013-03-18-librarians-warned-of-loyalty-duty-to-canada-s-government-high-risk-activities/">an interview on Radio Canada International</a>.</p>
<p>My question is &#8211; we&#8217;ve expressed our collective outrage at this Orwellian nightmare &#8211; and now what? Do we decide that Federal archival and library institutions are doomed and take on their role on the remaining islands of democracy or &#8220;&#8230;take arms against a sea of troubles, and by opposing end them&#8221;?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/1050/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/1050/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1050&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2013/03/24/freedom-abhors-a-chill/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2013/03/screen-shot-2013-03-24-at-12-55-43-pm.png" medium="image">
			<media:title type="html">Screen Shot 2013-03-24 at 12.55.43 PM</media:title>
		</media:content>
	</item>
		<item>
		<title>Is Clippy the Future?</title>
		<link>http://synthese.wordpress.com/2013/02/08/is-clippy-the-future/</link>
		<comments>http://synthese.wordpress.com/2013/02/08/is-clippy-the-future/#comments</comments>
		<pubDate>Fri, 08 Feb 2013 13:48:46 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=1036</guid>
		<description><![CDATA[The student-led Information without Borders conference that I attended at Dalhousie yesterday was truly excellent &#8211; as much for its organization (all by students!) as for its diverse topics: the future of libraries, cloud computing, recommender systems, sciverse apps and the foundations for innovation. At the panel discussion in which I participated, I suggested that to predict [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1036&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2013/02/iwblogo.jpg"><img class="alignleft  wp-image-1037" alt="iwblogo" src="http://synthese.files.wordpress.com/2013/02/iwblogo.jpg?w=168&#038;h=179" width="168" height="179" /></a>The student-led <a href="http://iwbconference.informationmanagement.dal.ca/">Information without Borders</a> conference that I attended at Dalhousie yesterday was truly excellent &#8211; as much for its organization (<em>all</em> by students!) as for its diverse topics: the future of libraries, cloud computing, recommender systems, sciverse apps and the foundations for innovation.</p>
<p>At the panel discussion in which I participated, I suggested that to predict the future one need only look at the past. To predict the iPad one needed only look at the Apple Newton (which died in 1998). What was the analog, I wondered, for an information retrieval tool, now dead and buried, that might still evolve into something we all want in the field of information management?</p>
<p>I proposed that the future of information retrieval might be something like an evolved <a href="http://en.wikipedia.org/wiki/Office_Assistant">Office Assistant</a>, (affectionately coined &#8220;Clippy&#8221;) &#8211; the infamous, now deceased Microsoft Paperclip that assisted you in understanding and navigating Microsoft products.</p>
<p>My vision for a next generation Clippy was clearly not well articulated since it prompted the following tweet from Stephen Abram:</p>
<p style="text-align:center;"><a href="http://synthese.files.wordpress.com/2013/02/abram-tweet2.png"><img class="size-full wp-image-1040 aligncenter" alt="abram-tweet" src="http://synthese.files.wordpress.com/2013/02/abram-tweet2.png?w=460&#038;h=68" width="460" height="68" /></a></p>
<p>I think that <a href="http://en.wikipedia.org/wiki/Siri_(software)">Siri</a>, (about which I <a href="http://synthese.wordpress.com/2010/05/09/siri-imgenie-reborn/">posted a few years</a> ago) belongs to the old Clippy style of annoying and in-the-way-of-what-I-want-to-do applications. I am surprised it has survived so long and was promoted by Apple so strongly. I predict it will join Clippy, <a href="http://synthese.wordpress.com/2009/05/30/google-wave/">Google Wave</a> and <a href="http://en.wikipedia.org/wiki/Project_Glass">Google Glasses</a> on the growing heap of unwanted technologies that were not ready for prime-time.</p>
<p>Watson (who is <a href="http://bits.blogs.nytimes.com/2012/10/30/i-b-m-s-watson-goes-to-medical-school/">now going to medical school</a>, and about which I also <a href="http://synthese.wordpress.com/2011/02/19/learning-from-watson/">posted a couple of years ago</a>) is, however, just the sort of Natural Language Understanding component technology that I have in mind for for an interactive, personal information assistant. When a computer that now costs three million dollars with15 terrabytes of RAM can fit in your pocket and cost $500, a Watson-like system that understands natural language queries will be an important component of Clippy++.</p>
<p>What neither Watson nor Siri have &#8211; and this is what I foresee in my crystal ball is the most significant attribute about &#8220;Clippy++&#8221; &#8211; is personalization and autonomy. What will make true personalization possible with &#8220;Clippy++&#8221; is our collective willingness to accept the intrusion of a mechanical supervisor that learns from our behaviour about what we want, need and expect.</p>
<p>This culture-shift his happening right now &#8211; we gladly and willingly disclose our information consumption habits to supervisory software and data-analytics engines in exchange for entertainment and social networking. It won&#8217;t be long before we&#8217;re willing to do that for serious, personalized information management purposes as well.</p>
<p>The key, though, is going to be the <em>interaction</em> &#8211; the dialog that we have with Clippy++ &#8211; and it will have to have explanations for its actions and recommendations. That&#8217;s going to be the hallmark of its evolution to Machina Sapiens.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/1036/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/1036/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1036&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2013/02/08/is-clippy-the-future/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2013/02/iwblogo.jpg" medium="image">
			<media:title type="html">iwblogo</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2013/02/abram-tweet2.png" medium="image">
			<media:title type="html">abram-tweet</media:title>
		</media:content>
	</item>
		<item>
		<title>The End of Files</title>
		<link>http://synthese.wordpress.com/2012/12/08/the-end-of-files/</link>
		<comments>http://synthese.wordpress.com/2012/12/08/the-end-of-files/#comments</comments>
		<pubDate>Sat, 08 Dec 2012 20:07:51 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Digital library]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=1013</guid>
		<description><![CDATA[A few weeks ago, I boldly predicted in my class on copyright that the computer file was as doomed in annals of history as the piano roll (the last of which was printed in 2008 &#8211; See this documentary video on YouTube on how they are made and copied!) This is a slightly different prediction than the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1013&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A few weeks ago, I boldly predicted in my class on copyright that the <a href="http://en.wikipedia.org/wiki/Computer_file">computer file</a> was as doomed in annals of history as the piano roll (the last of which was printed in 2008 &#8211; See this documentary video on YouTube on how they are made and copied!)</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='460' height='289' src='http://www.youtube.com/embed/i3FTaGwfXPM?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p>This is a slightly different prediction than the one made by the Economist in 2005: <a href="http://www.economist.com/node/4368267">Death to Folders</a>. Their argument was that folders as a method of organizing files was obsolete and that search, tagging and &#8220;smart folders&#8221; were going to change everything. My assertion is the very notion of a file &#8211; these things that are copied, edited, executed by computers - will eventually disappear (to the end-user, anyway.)</p>
<p>The path to the &#8220;end of files&#8221; is more than just a question of <em>masking</em> the underlying data-representation to the user. It is true that Apps (as designed for mobile devices) have begun to do that as a convenient way of hiding the details of a file from the user &#8211; be it an application file or a document file.  The reason that Apps (generally) contain within them the (references to) data-items (i.e. files) that they need, particularly if the information is stored in the cloud, is to provide a Digital Rights Management scheme. Which no doubt why this App model is slowly creeping its way from mobile devices to mainstream laptops and desktops (viz. Mac OS Mountain Lion and Windows 8).</p>
<p>But this is just the beginning.  There&#8217;s going to be a paradigm shift (a perfectly fine phrase, when it&#8217;s used correctly!) in our mental representations of computing objects and it is going to be more profound than merely masking the existence of the underlying representation. I think the new paradigm that will replace &#8220;file&#8221; is going to be: &#8220;the set of information items and interfaces that are needed to perform some action the current use-context&#8221;.</p>
<p>Consider as an example of this trend towards the new paradigm, Wolfram&#8217;s <a href="http://www.wolfram.com/cdf/">Computable Document Format</a>. In this model, documents are created by dynamically assembling components from different places and performing computations on them.  In this model there are distributed, raw information components &#8211; data mostly &#8211; that are assembled in the application and don&#8217;t correspond to a &#8220;file&#8221; at all. Or consider information mashups like Google Maps with restaurant reviews and recommendations are generated as a function of search-history, location, and user-identity.  These &#8220;content-bundles&#8221;, for want of a better phrase, are definitely not files or documents but, from the end-user&#8217;s point of view, they are also indistinguishable from them.</p>
<p>Even, MS Word DocX &#8220;files&#8221; are instances of this new model.  The <a href="http://msdn.microsoft.com/en-us/library/aa338205.aspx">Open Document XML file format</a> is a standardized data-structure: XML components bound together in a zip file. Imagine de-regimenting this convention a little and what constitutes a &#8220;document&#8221; could change quite significantly.</p>
<p>Conventional, static files will continue to exist for some time and <a href="http://en.wikipedia.org/wiki/Revision_control">version control systems</a> will continue to provide change management services to what we now know as &#8220;files&#8221;. But I predict that my grand children won&#8217;t know what a file is &#8211; and won&#8217;t need to.  The procedural instructions required for assembling information-packages out of components, including the digital rights constraints that govern them, will eventually dominate the world of consumable digital content to the point where the idea of a file will be obsolete.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/1013/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/1013/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=1013&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/12/08/the-end-of-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>
	</item>
		<item>
		<title>Marissa Mayer Wants to Read Your Mind</title>
		<link>http://synthese.wordpress.com/2012/08/14/marissa-mayer-wants-to-read-you-mind/</link>
		<comments>http://synthese.wordpress.com/2012/08/14/marissa-mayer-wants-to-read-you-mind/#comments</comments>
		<pubDate>Wed, 15 Aug 2012 02:01:36 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Digital Identity]]></category>
		<category><![CDATA[Personal identity]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=989</guid>
		<description><![CDATA[At about minute 3 of Charlie Rose&#8217;s Green Room interview with Marissa Mayer, the newly minted CEO of Yahoo offers a vision of the mobile future and asks &#8220;How do we create a search without search? Can we figure out the information you need before you even have to ask?&#8221; And, she says excitedly, &#8220;that&#8217;s [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=989&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A<img class="alignleft" src="http://5.mshcdn.com/wp-content/uploads/2012/07/marissa-mayer-600.jpg" alt="" width="252" height="158" />t about minute 3 of Charlie Rose&#8217;s <a href="http://www.charlierose.com/view/clip/12214">Green Room interview with Marissa Mayer</a>, the newly minted CEO of Yahoo offers a vision of the mobile future and asks &#8220;How do we create a search without search? Can we figure out the information you need before you even have to ask?&#8221; And, she says excitedly, &#8220;that&#8217;s really like mind reading technology!&#8221;</p>
<p>The inference? Be prepared for Yahoo to read your mind!</p>
<p>I have been a proponent of personalization since 2000, when I worked on developing &#8220;Personal Identity Management&#8221; services at Nortel. The idea at the time was (for a telecom company) to enable IP devices (routers / gateways) to track / manage / control your on-line identity and provide identity services (single sign-on, personalization of news services, etc.) to the user.</p>
<p>This was conceived at about the time that <a href="http://msdn.microsoft.com/en-us/library/bb263932(v=vs.85).aspx">Microsoft Hailstorm</a> was being launched. The only fundamental difference was &#8211; which service provider &#8211; &#8220;network access&#8221; vs. &#8220;operating system&#8221; vs. &#8220;third party service&#8221; &#8211; would be the trusted source for managing your identity.</p>
<p>From a public relations point of view Hailstorm and its successors Microsoft Passport, and Wallet, were <a href="http://www.zdnet.com/blog/bott/why-does-microsoft-passport-suck/30">a disaster</a>. Invasion of privacy, identity theft, all the usual public anxiety buttons were pressed and Microsoft dropped a lot of these products &#8211; or at least gave them a makeover.</p>
<p>Yet, a few internet generations later, these ideas persist.  Google didn&#8217;t make a big PR campaign of it, but everything at Google is about personalization and localization as illustrated most graphically by the (dystopic?) <a href="http://www.youtube.com/watch?v=9c6W4CCU9M4">Google Glasses video</a>.</p>
<p>But &#8211; fortunately, I might add &#8211; I am noticing a (small) swing of the pendulum away from machine-learning, Netflix-style personalization towards a &#8220;how do you want it?&#8221; style of personalization.</p>
<p>For instance, Google News used to be fully and automatically biased towards your location. Since the summer of 2011, Google has given the end-user a great deal more control.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='460' height='289' src='http://www.youtube.com/embed/JmxL5BlVzZQ?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p>Marissa Mayer may want to read your mind, but I know that most people <em>don&#8217;t</em> want to have their minds read by machines. I think the trend towards great user-control will eventually spread to more personalization and recommender services. I hope so anyway.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/989/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/989/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=989&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/08/14/marissa-mayer-wants-to-read-you-mind/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://5.mshcdn.com/wp-content/uploads/2012/07/marissa-mayer-600.jpg" medium="image" />
	</item>
		<item>
		<title>The Future of Universities is Here</title>
		<link>http://synthese.wordpress.com/2012/07/19/the-future-of-universities-is-here/</link>
		<comments>http://synthese.wordpress.com/2012/07/19/the-future-of-universities-is-here/#comments</comments>
		<pubDate>Thu, 19 Jul 2012 12:44:47 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Open Access]]></category>
		<category><![CDATA[Universities]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=977</guid>
		<description><![CDATA[An impressive list of 16 universities (including the Ecole Polytechnique Federale de Lausanne and the University of Edinburgh) have now signed up with Coursera to offer free on-line courses.  I audited one a few months ago on Natural Language Processing (from Stanford) to see what it was like &#8211; it was stunningly good. My very first thought [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=977&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2012/07/coursera-2.jpg"><img class="alignleft size-medium wp-image-979" title="coursera-2" src="http://synthese.files.wordpress.com/2012/07/coursera-2.jpg?w=300&#038;h=137" alt="" width="300" height="137" /></a>An impressive list of 16 universities (including the <a href="https://www.coursera.org/epfl">Ecole Polytechnique Federale de Lausanne</a> and the <a href="https://www.coursera.org/edinburgh">University of Edinburgh</a>) have now signed up with <a href="http://coursera.org">Coursera</a> to offer free on-line courses.  I audited one a few months ago on <a href="https://www.coursera.org/course/nlp">Natural Language Processing</a> (from Stanford) to see what it was like &#8211; it was stunningly good.</p>
<p>My very first thought was &#8220;<a href="http://twitter.com/vellino/status/181085816423067648">the future of conventional universities is in doubt</a>&#8220;. This course alone had 42,000 registrants, 24,000 of which watched at least one video. Only 1,400 of the registrants got a &#8220;certificate of achievement&#8221; (i.e. completed the course and handed in all the assignments) but in the meantime there were 800,000 video-downloads of the courseware.</p>
<p>Distance-learning or on-line courses have been around for a long time &#8211; in the same way that &#8220;finger&#8221;, &#8220;who&#8221; and &#8220;chat&#8221; in Unix had been around a long time before Facebook, Linked-In and Instant Messaging.  The difference now is that major Universities are jumping on the bandwagon and offering them for free.  Why? Perhaps because of decreasing enrolment: free on-line courses are a way to recruit students from everywhere and to show them the best of what universities have to offer.</p>
<p>But also (in the US anyway), education is a business (see the Frontline documentary on the business of higher education: <a href="http://www.pbs.org/wgbh/pages/frontline/collegeinc/view/">College Inc.</a>)  That universities are feeling the financial pinch and being pressed by their boards to be more agressive in the marketplace was perhaps most visibly illustrated at the <a href="https://www.nytimes.com/2012/06/27/education/university-of-virginia-reinstates-ousted-president.html?_r=1&amp;pagewanted=all">University of Virginia</a> (the case against on-line education is elegantly articulated by Mark Edmundson &#8211; a professor of English at the University of Virginia &#8211; in a <a href="http://www.nytimes.com/2012/07/20/opinion/the-trouble-with-online-education.html">New York Times OpEd</a> article).</p>
<p>Making courses on-line available for free will be a moneymaker when they start counting towards a degree, which clearly inevitable in the long run. However, I didn&#8217;t expect this development to come so soon after the beginning of the experiment. The Seattle Times reported just yesterday that <a href="http://seattletimes.nwsource.com/html/localnews/2018714077_coursera19m.html">the University of Washington is going to be offering some of their Coursera courses for credit</a>.</p>
<p>Canada, in the meantime, has its own <a href="http://www.cvu-uvc.ca/english.html">Canadian Virtual University</a> which lists over <a href="http://www.cvu-uvc.ca/cgi-bin/cvu/cvucrsinfo.cgi?qn=subject&amp;lang=en">2,000 courses</a> and 300 degrees and diplomas available on-line. The difference with Coursera is that the CVU is not free.</p>
<p>Anyone see any parallels with the publishing industry here?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/977/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/977/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=977&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/07/19/the-future-of-universities-is-here/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2012/07/coursera-2.jpg?w=300" medium="image">
			<media:title type="html">coursera-2</media:title>
		</media:content>
	</item>
		<item>
		<title>Government Research in Canada</title>
		<link>http://synthese.wordpress.com/2012/07/08/government-research-in-canada/</link>
		<comments>http://synthese.wordpress.com/2012/07/08/government-research-in-canada/#comments</comments>
		<pubDate>Sun, 08 Jul 2012 20:01:19 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Government Science]]></category>
		<category><![CDATA[Universities]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=948</guid>
		<description><![CDATA[When I started as a Research Officer at the National Research Council six years ago, the idea of &#8220;research&#8221; &#8211; in the sense of systematically studying a topic for the purpose of advancing knowledge in the field &#8211; was not only encouraged but constitutive of the job description. In most respects, the work of an [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=948&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.bioaccess.ca/images/stories/nrc-badge2010-e-cymk.png"><img class="alignleft" src="http://www.bioaccess.ca/images/stories/nrc-badge2010-e-cymk.png" alt="" width="149" height="91" /></a>When I started as a Research Officer at the National Research Council six years ago, the idea of &#8220;research&#8221; &#8211; in the sense of systematically studying a topic for the purpose of advancing knowledge in the field &#8211; was not only encouraged but constitutive of the job description. In most respects, the work of an NRC Research Officer was indistinguishable from that of a University Professor &#8211; minus the teaching responsibilities.</p>
<p>Since then, there has been a gradual but significant shift in the function of Government research institutions in Canada. For instance, according to <a href="http://www.researchmoneyinc.com/conferences/201105/ppt/National%20Research%20Council-%20John%20McDougall.ppt">a presentation given to &#8220;Re$earch Money&#8221;</a> by the president of the NRC, its Vision is:</p>
<blockquote><p>To be the most effective research and technology organization in the world, stimulating sustainable domestic prosperity.</p></blockquote>
<p>And its Mission is</p>
<blockquote><p>Working with clients and partners, we provide strategic research, scientific and technical services to develop and deploy solutions to meet Canada’s current and future industrial and societal needs.</p></blockquote>
<p>The first question that comes up with the Vision is: what is a &#8220;research and technology organization&#8221;? That phrase &#8211; &#8220;RTO&#8221; for those in the know &#8211; means something quite specific. It is a label for the set of things that includes such institutions as the <a href="http://www.fraunhofer.de/en.html">Fraunhofer Institute</a> and <a href="http://www.battelle.org/">Battelle</a> but also Finland&#8217;s <a href="http://www.vtt.fi/vtt/index.jsp">VTT</a> (&#8220;Business from Technology&#8221;) and <a href="http://www.rto.nato.int/Main.asp?topic=18">Nato&#8217;s RTO</a>.</p>
<p>Organizations like that do interesting things: they are catalysts for exchanging information, they set strategies, give advice, design new products, patent processes and bring mature ideas to commercial reality.  All of this is useful and important but it isn&#8217;t &#8220;basic research&#8221;, at least not in the sense of &#8220;advancing knowledge&#8221;.</p>
<p>So what is happening to basic research in government?  It is being outsourced to universities. The executive director of the Canadian Association of University Teachers (CAUT), James Turk, put it this way in an <a href="http://www.caut.ca/pages.asp?page=1078">op ed column in the Ottawa Citizen</a> a few months ago:</p>
<blockquote><p>[Minster Goodyear] claims that [the NRC] no longer needs to [undertake basic research] because universities today play that role.</p></blockquote>
<p>But, Turk also points out,</p>
<blockquote><p>Many university-based researchers <em>rely</em> upon the NRC for their scientific work. By gutting the basic research program of the NRC, the government will be weakening university research.</p></blockquote>
<p>Thus, from the government&#8217;s point of view, basic research should be an <a href="http://en.wikipedia.org/wiki/Externality">externality</a> because it incurs long-term costs and no short-term benefits. By outsourcing research to universities long-term costs are downloaded to the provinces.</p>
<p>This was Nortel&#8217;s strategy too in it&#8217;s later years (~ 1995), and it was RIM&#8217;s as well (see also <a href="http://www.theglobeandmail.com/technology/canadas-vanishing-tech-sector/article4396596/">Canada&#8217;s Vanishing Tech Sector</a>).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/948/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/948/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=948&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/07/08/government-research-in-canada/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.bioaccess.ca/images/stories/nrc-badge2010-e-cymk.png" medium="image" />
	</item>
		<item>
		<title>Steve Jobs was Right about AppleTV UI</title>
		<link>http://synthese.wordpress.com/2012/04/22/steve-jobs-was-right-about-appletv-ui/</link>
		<comments>http://synthese.wordpress.com/2012/04/22/steve-jobs-was-right-about-appletv-ui/#comments</comments>
		<pubDate>Sun, 22 Apr 2012 19:08:31 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Information]]></category>
		<category><![CDATA[User Interface]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=933</guid>
		<description><![CDATA[AppleInsider reported a few weeks ago that Steve Jobs rejected &#8211; as long as 5 years ago &#8211; the newly introduced Apple TV user interface. Predictably, Steve was right: the new UI for AppleTV has some major flaws in not just one but several dimensions: usability, cognitive modeling and information organization. Consider this snapshot of [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=933&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>AppleInsider reported a few weeks ago that <a href="http://www.appleinsider.com/articles/12/03/24/ex_apple_engineer_claims_steve_jobs_rejected_new_apple_tv_ui_5_years_ago.html">Steve Jobs rejected &#8211; as long as 5 years ago &#8211; the newly introduced Apple TV user interface</a>. Predictably, Steve was right: the new UI for AppleTV has some major flaws in not just one but several dimensions: usability, cognitive modeling and information organization.</p>
<p>Consider this snapshot of the old UI:</p>
<p style="text-align:center;"><a href="http://newtech.aurum3.com/images/apple-tv3.jpg"><img class="aligncenter" src="http://newtech.aurum3.com/images/apple-tv3.jpg" alt="" width="461" height="259" /></a></p>
<p>The top third of the screen is reserved for image thumbnails that correspond to offerings in the highlighted service.  The remote&#8217;s navigation buttons change only the horizontal and vertical menu choices and the menus correspond to the categories of services available. [The top-level thumbnails are also accessible to get to the item directly.]</p>
<p>Admittedly there are some problems with this way of organizing the user&#8217;s entertainment options.  One is that the top level categories are not all the same kind of thing.  &#8221;Internet&#8221; is a mode of delivery (which, of course, is also the mode of delivery for the rest of AppleTV content), whereas the others are descriptive of the kind of objects that are below the main menu item. What &#8220;Internet&#8221; means, clearly, is &#8220;other, non-apple applications&#8221;.  In addition, more recent AppleTV top-level menus also has the &#8220;Computer&#8221; category, meaning &#8220;Content streamed for your local computer running iTunes&#8221;, adding a second source-centered category.</p>
<p>However, at least the old interface makes <em>some</em> attempt at grouping content. Furthermore, the interface for the top-level navigation resembles in structure the navigation system implemented for each of the applications.  The interface has the consistency hallmark of Apple interfaces generally: learn the interface for one application and you know (more or less) how all the others behave.</p>
<p>Contrast this with the new interface.  In some respects, it is similar to the old one &#8211; thumbnails of content-images appear at the top of the screen, as expected and the content sources are more or less the same.</p>
<p style="text-align:center;"><img class="aligncenter" src="http://support.firecore.com/attachments/token/kcygssvrty5spxx/?name=sp-succes-50.jpg" alt="" width="461" height="259" /></p>
<p>However, the artificial segregation by source or kind is eliminated altogether: <em>all</em> the applications on the same footing, iPad-App style.</p>
<p>The first serious problem starts manifesting when you scroll just one line down: the 1/2-page sized thumbnails disappear altogether.  Yet the selected applications (I bet) are still generating those thumbnails &#8211; you just can&#8217;t see them any more.</p>
<p>Right away, this gives screen real estate dominance to the first row of applications &#8211; Apple iTunes applications, naturally. Furthermore, you can&#8217;t go straight to the items in the thumbnails because you can&#8217;t see them any more.</p>
<p style="text-align:left;">The second major flaw comes from the mixed-mode cognitive models.  The first-level application-selection mode is (vaguely) iPad-like (without the ability to group apps, rearrange them or create screen-pages). However, once you&#8217;ve selected an application you&#8217;re back to the (more familiar and sensible) menu-navigation system.</p>
<p style="text-align:left;">What&#8217;s worse, though, is that the menu system for each application is now no longer consistent.  &#8221;Movies&#8221; (short for &#8220;iTunes Movie Store&#8221;) has a Mac-style top-level menu-bar rather than a right-side menu navigation bar like all the other applications. Gone is the consistent Apple look-and-feel.</p>
<p style="text-align:left;">If at least the user had the ability to group applications as they see fit and to delete the unwanted ones (why not, the iPod/iPad allows that?).</p>
<p style="text-align:left;">Theres just no doubt about it.  Steve was right.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/933/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/933/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=933&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/04/22/steve-jobs-was-right-about-appletv-ui/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://newtech.aurum3.com/images/apple-tv3.jpg" medium="image" />

		<media:content url="http://support.firecore.com/attachments/token/kcygssvrty5spxx/?name=sp-succes-50.jpg" medium="image" />
	</item>
		<item>
		<title>Building a Better Citation Index</title>
		<link>http://synthese.wordpress.com/2012/03/20/building-a-better-citation-index/</link>
		<comments>http://synthese.wordpress.com/2012/03/20/building-a-better-citation-index/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 19:06:55 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Citation]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=914</guid>
		<description><![CDATA[Scholars in a variety of disciplines (not just bibliometrics!) have been building better measures of scholarly output.  First came the H-index in 2005 followed by the G-index in 2006, and these are now part of the standard measures for scholarly output. However, as Daniel Lemire points out in his latest blog post, the raw data of mere citations [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=914&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><img class="alignleft" src="http://chronicle.com/blogs/profhacker/files/2011/11/ryan-citation-needed.jpg" alt="" width="168" height="126" />Scholars in a variety of disciplines (not just bibliometrics!) have been building better measures of scholarly output.  First came the <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1283832/">H-index</a> in 2005 followed by the <a href="http://www.akademiai.com/content/4119257t25h0852w/">G-index</a> in 2006, and these are now part of the standard measures for scholarly output.</p>
<p>However, as Daniel Lemire points out in <a href="http://lemire.me/blog/archives/2012/03/20/from-counting-citations-to-measuring-usage-help-needed/">his latest blog post</a>, the raw data of mere citations is pretty crude.  In any given article, it&#8217;s often hard to tell which of the (typically) dozens of references are &#8220;en passant&#8221; (to fend off the critics who might think you haven&#8217;t read the literature) or incidental to the substance of the article. What&#8217;s interesting for the authors of the articles being cited is the question &#8220;how citical is this citation to the author who cited me&#8221;?</p>
<p>One way to find out (and hence, perhaps, to build a better citation measure) is to train a Machine Learning algorithm to extract &#8220;key citations&#8221; &#8211; by analogy with extracting &#8220;key phrases&#8221; from a text (see Peter Turney&#8217;s 2000 article <a href="http://dx.doi.org/10.1023/A:1009976227802">Machine Learning Algorithms for Keyphrase Extraction</a>). As a starting point, we&#8217;d like to compile data from researchers which asks the question: &#8220;What are the key references of your papers?&#8221;</p>
<p>It will take 10 minute: please fill  <a href="https://docs.google.com/spreadsheet/viewform?formkey=dHlDalFfR1AzTXpaRXA2WEVlRUF5b0E6MA#gid=0">this Google-documents questionaire</a>. In it we ask you, as the author of an article, to tell us which 1, 2, 3 or 4 references are essential to that article. By an essential reference, we mean a reference that was highly influential or inspirational for the core ideas in your paper; that is, a reference that inspired or strongly influenced your new algorithm, your experimental design, or your choice of a research problem.</p>
<p>When this survey is completed, we will be releasing the resulting data set under the <a href="http://opendatacommons.org/licenses/pddl/1-0/">ODC Public Domain Dedication and Licence</a> so that you can use this data in other ways, if you wish.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/914/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/914/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=914&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/03/20/building-a-better-citation-index/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://chronicle.com/blogs/profhacker/files/2011/11/ryan-citation-needed.jpg" medium="image" />
	</item>
		<item>
		<title>Elsevier Boycott &#8211; Academics, Get a Grip!</title>
		<link>http://synthese.wordpress.com/2012/02/25/elsevier-boycott-academics-get-a-grip/</link>
		<comments>http://synthese.wordpress.com/2012/02/25/elsevier-boycott-academics-get-a-grip/#comments</comments>
		<pubDate>Sat, 25 Feb 2012 18:40:14 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=888</guid>
		<description><![CDATA[At the risk of being shunned by the now 7,000+ prestigious colleagues who are actively boycotting Elsevier, I’d like to appeal to the better angels of their nature and ask them to stop whipping up a frenzy of outrage and indignation that pits Elsevier (“axis of evil”) against Us (“freedom of thought”). I worry that [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=888&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><img class="alignleft" alt="" src="http://www.elsevier.ca/images/ca/Elsevierlogo2.jpg" height="93" width="85" />At the risk of being shunned by the now 7,000+ prestigious colleagues who are <a href="http://thecostofknowledge.com/">actively boycotting Elsevier</a>, I’d like to appeal to the better angels of their nature and ask them to stop whipping up a frenzy of outrage and indignation that pits Elsevier (“axis of evil”) against Us (“freedom of thought”). I worry that this polarization of the issues is clouding our individual and collective judgement about what the fundamental problems are and what can and should be done about them.</p>
<p>It is undeniable that there are real and serious problems with academic publishing (as pointed out very cogently by Fields Medalist Tim Gowers <a href="/gowers.wordpress.com/2012/01/21/elsevier-my-part-in-its-downfall/">here</a>,  John Dupuis (Head of York’s Science’s Library), <a href="http://scienceblogs.com/confessions/2012/02/elsevier_boycott_time_for_libr.php">here</a> and Barbara Fister in the Library Journal <a href="http://lj.libraryjournal.com/2012/02/opinion/barbara-fister/joining-the-movement-a-call-to-action-peer-to-peer-review/">here</a>). And the Open Access movement is one I support. The concentration of control over journals by one for-profit publisher is clearly one of the core problems and the questionable practices (e.g. “bundling”) that they can consequently employ is another.</p>
<p>But who (or rather <em>what</em>) exactly is to “blame” (if that’s the right thing to do) for this situation? Elsevier is behaving rationally – from a market-forces point of view anyway. Maximizing profits is what any private enterprise does, particularly one that is <a href="http://www.google.com/finance?cid=663502">publicly traded on stock exchanges</a>. Elsevier (the publisher) is owned by <a href="http://www.reedelsevier.com/investorcentre/sharepriceinformation/pages/home.aspx">Reed Elsevier</a> which also owns <a href="http://www.lexisnexis.com">Lexis Nexis</a> (which offers law information and services) and <a href="http://reedbusiness.com/index.html">Reed Elsevier Business</a> (which provides data services, information and marketing solutions to businesses). Is this a portfolio mix that should be permitted by law? After all there are <a href="http://en.wikipedia.org/wiki/Competition_law">anti-trust laws</a> that prohibit monopoly ownership in other domains.</p>
<p>One fundamental problem is that a public good (knowledge) has been comoditized, marketed and sold by a private, for profit enterprise. The officials within Elsevier who are in charge of the company don’t have a lot of room to manoeuvre if they are to comply with the stock-market forces that urge them to forever greater profitability.</p>
<p>Here’s a suggestion to the signatories of the Elsevier boycott: go to your pension-fund manager (university or government) and find out if any of the mutual funds, exchange-traded funds or stock portfolios they own have stock in Elsevier-Reed. I’m willing to bet they do. Preasure them to boycott those investments – I’m willing to bet that will have more influence.</p>
<p>Of course, the academic boycott has been heard, as evidenced by <a href="http://www.elsevier.com/wps/find/intro.cws_home/elsevieropenletter">Elsevier’s open-letter reply</a> of February 6th. That is one way to precipitate some kind of change towards greater openness of intellectual output. But lets not delude ourselves into thinking that this is going to address the root problem: the inadequate funding of publicly-owned channels of knowledge dissemination.</p>
<p>Instead, could we harness this desire for change towards lobying governments for more funding for university and independant open-access publishers (and tone down the rhetoric against Elsevier a little)?</p>
<p>P.S. I think it’s pretty important, for this post especially, to make it clear that these are my personal opinions (as are all my blog posts here) and in no way reflect the views of my employer.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/888/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=888&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/02/25/elsevier-boycott-academics-get-a-grip/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.elsevier.ca/images/ca/Elsevierlogo2.jpg" medium="image" />
	</item>
		<item>
		<title>Drummond Report &#8211; Implications for Ontario Universities</title>
		<link>http://synthese.wordpress.com/2012/02/19/drummond-report-implications-for-ontario-universities/</link>
		<comments>http://synthese.wordpress.com/2012/02/19/drummond-report-implications-for-ontario-universities/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 16:03:20 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Universities]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=875</guid>
		<description><![CDATA[The Drummond Report, as everyone in Ontario knows by now, offers 362 recommendations (why not 365? &#8211; they could have created a desktop annual calendar out of them!) for public service deficit-reduction.  Since this is a 500+ page report and I was especially interested in the report&#8217;s recommendations for colleges and universities (note that &#8220;PSE&#8221; in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=875&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2012/02/drummond-report.png"><img class="alignleft  wp-image-877" title="drummond-report" src="http://synthese.files.wordpress.com/2012/02/drummond-report.png?w=157&#038;h=180" alt="" width="157" height="180" /></a>The <a href="http://www.fin.gov.on.ca/en/reformcommission/chapters/report.pdf">Drummond Report</a>, as everyone in Ontario knows by now, offers 362 recommendations (why not 365? &#8211; they could have created a desktop annual calendar out of them!) for public service deficit-reduction.  Since this is a 500+ page report and I was especially interested in the report&#8217;s recommendations for colleges and universities (note that &#8220;PSE&#8221; in the report means &#8220;Post Secondary Education&#8221;), I thought it might be useful to extract some of them here.</p>
<p>From the Executive Summary.</p>
<p><strong>Tuition Fees</strong>: they shouldn&#8217;t be frozen, but they also shouldn&#8217;t increase faster than inflation.  Freezing them would likely result in &#8220;further deterioration of the student experience — larger classes and less opportunity to debate and develop critical thinking skills&#8221;</p>
<p><strong>Teaching vs. Research</strong>: &#8220;Increasingly, universities are letting professors sacrifice teaching commitments to conduct more research. There must be a better balance; excellent research should not trump excellent teaching.&#8221;</p>
<p><strong>Overall Recommendations</strong>:</p>
<blockquote><p>The current system is unsustainable from both a financial and a quality perspective.</p>
<p>The Commission recommends the following:</p>
<ol>
<li>Contain government funding and institutional expenses;</li>
<li>Use differentiation to improve PSE quality and achieve financial sustainability;</li>
<li>Encourage and reward quality;</li>
<li>Revise research funding structures;</li>
<li>Maintain the current overall cap on tuition-fee increases, but simplify the framework;</li>
<li>Re-evaluate student financial assistance; and</li>
<li>Generate cost efficiencies by, for example, integrating administrative and back-office functions.</li>
</ol>
</blockquote>
<p>Now to expand on a few of these points (items 2 and 3 in particular).</p>
<p>2. Universities and colleges should not overlap in their functions (degree-granting) and programs (i.e. be more differentiated)</p>
<blockquote><p>The division of roles between the college and university systems should include the following features:</p>
<ul>
<li>After two years of study, college students who meet specific academic achievement criteria should be able to transfer into the university system;</li>
<li>Colleges should not be granted any new degree programs, but existing programs should be grandfathered;</li>
<li>The government should approve no new PSE programs until existing programs are rationalized and mandate agreements completed;</li>
<li>No new professional and specialized programs should be approved without a compelling business case; and</li>
<li>The Colleges of Applied Arts and Technology should work with the College of Trades to optimize the delivery of apprentice training in non-degree programs.</li>
</ul>
</blockquote>
<p>3. Encourage and Reward Quality means, among other things, focusing on more rewards for teaching:</p>
<blockquote><p><strong>Resources and rewards should be refocused towards teaching</strong>: Post-secondary education institutions should devote more resources to experience-based learning such as internships, allow for more independent study, develop problem-based learning and increase study abroad. Universities should be encouraged to include in their collective agreements flexible provisions with faculty regarding teaching and research workloads. Top-performing teachers and researchers should be recognized with the appropriate workloads and rewards. Eleven Ontario universities already have such flexibility; others should follow. Institutions should redesign incentive systems to reward excellent teachers, as they do now for researchers.</p>
<p><strong>Refocus provincial funding to reward teaching excellence</strong>: Provincial funding allocations should be linked to quality objectives, and the funding model should reward degrees awarded rather than just enrolment [sic] levels. Government and PSE institutions should work to ensure that the capacity to integrate ideas and create innovative solutions to problems is at the heart of the higher education experience. This will be critical to the economic and social success of Ontario, in an economy where graduates will be working over their career in ways that cannot even be imagined now.</p></blockquote>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/875/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/875/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=875&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2012/02/19/drummond-report-implications-for-ontario-universities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2012/02/drummond-report.png?w=262" medium="image">
			<media:title type="html">drummond-report</media:title>
		</media:content>
	</item>
		<item>
		<title>Review: &#8220;Mahout in Action&#8221;</title>
		<link>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/</link>
		<comments>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 15:23:46 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Book Review]]></category>
		<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Recommender service]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=714</guid>
		<description><![CDATA[In early September 2010 (I&#8217;m embarassed to count many months ago that was!) I received an Early Access (PDF) copy of &#8220;Mahout in Action&#8221; (MIA) from Manning Publications and asked to write a review. There have been 4 major updates to the book (now no longer &#8220;early access&#8221;!) since then and although it is too [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=714&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<div>
<p><img class="alignleft" src="http://www.manning.com/owen/owen_cover150.jpg" alt="" width="150" height="188" />In early September 2010 (I&#8217;m embarassed to count many months ago <em>that</em> was!) I received an Early Access (PDF) copy of &#8220;<a href="http://www.manning.com/owen/">Mahout in Action</a>&#8221; (MIA) from Manning Publications and asked to write a review. There have been 4 major updates to the book (now no longer &#8220;early access&#8221;!) since then and although it is too late to fulfill their purpose in giving me an early access to review (no doubt a supportive quote for the dust jacket or web site), I thought I&#8217;d nevertheless post my belated notes.</p>
<p><a href="http://lucene.apache.org/mahout/">Mahout is an Apache project</a> that develops scalable machine learning libraries for recommendation, clustering and classification. Like many other such software-documentation &#8221;in Action&#8221; books for Apache projects (Lucene / Hadoop / Hibernate / Ajax, etc.), the primary purpose of MIA is to complement the existing software documentation with both an explanatory guide for how to use these libraries and some practical examples of how they would be deployed.</p>
<p>First I want to ask: &#8220;how does one go about reviewing such a book&#8221;? Is it possible to dissassociate one&#8217;s opinion about the book itself from one&#8217;s opinion of the software? If the software is missing an important algorithm, does this impugn the book in any way?</p>
<p>The answers to these questions are, I think, &#8220;yes&#8221; and &#8220;no&#8221; respectively. Hence, the following comments assess the book on its own merits and in relation to the software that it documents, not in relation to the machine learning literature at large. Indeed, the fact that this book is not a textbook on or an authoritative source for machine learning is made quite explicit at the beginning of the book and the authors make no claim at being experts in the field of Machine Learning.</p>
<p>It&#8217;s important to understand that Mahout came about in part as a refactoring excercise in the <a href="http://lucene.apache.org/">Apache Lucene</a> project, since several modules in Lucene use information retrieval techniques such as vector based models for document semantics (see the survey paper by Peter Turney and Patrick Pantel &#8220;<a href="http://www.jair.org/media/2934/live-2934-4846-jair.pdf">From Frequency to Meaning: Vector Space Models of Semantics</a>&#8220;). The amalgamation of those modules with the open source collaborative filtering system (formerly called <em>Taste</em>) by co-author Sean Owen yielded the foundation for Mahout.</p>
<p>Thus, if  there are gaps in Mahout software it is an accident of history more than a design flaw.  Like most software &#8211; especially open-source software &#8211; Mahout is still &#8220;under construction&#8221;, as evidenced by its current version number (&#8220;0.5&#8243;). Even though many element are quite mature there are also several missing elements and whatever lacunae there are should be considered as an opportunity to contribute and improve this library rather than to criticize it.</p>
<p>One obvious source for comparison is <a href="http://www.cs.waikato.ac.nz/ml/weka/">Weka</a> &#8211; also an open-source machine learning library in Java. The book associated with this library &#8211; <a href="http://www.cs.waikato.ac.nz/~ml/weka/book.html">Data Mining: Practical Machine Learning Tools and Techniques</a> (Second Edition) by Ian H. Witten, Eibe Frank &#8211; was published in 2005 and has a much more pedagogical purpose than Mahout in Action. In contrast with MIA, &#8220;Data Mining&#8221; is much more of an academic book, published by academic researchers, whose purpose is to teach readers about Machine Learning.  In that way, these two books are complimentary, particularly as there are no algorithms devoted to recommendations in Weka and many more varieties of classification and clustering algorithms in Weka than in Mahout.</p>
<p>The Mahout algorithms that are discussed in MIA include the following.</p>
<ul>
<li>Collaborative Filtering</li>
<li>User and Item based recommenders</li>
<li>K-Means, Fuzzy K-Means clustering</li>
<li>Mean Shift clustering</li>
<li>Dirichlet process clustering</li>
<li>Latent Dirichlet Allocation</li>
<li>Singular value decomposition</li>
<li>Parallel Frequent Pattern mining</li>
<li>Complementary Naive Bayes classifier</li>
<li>Random forest decision tree based classifier</li>
</ul>
<p>The integration of Mahout with Apache&#8217;s implementation of MapReduce &#8211; <a href="http://hadoop.apache.org/">Hadoop </a>- is no doubt the unique characteristic of this software. If you want to use a distributed computing platform to implement these kinds of algorithms, Mahout and MAI is the place to start.</p>
<p>On its own terms, then, how does the book fare? It is fair to say &#8211; for the quotable extract &#8211; that Mahout in Action is an indispensible guide to Mahout! I wish I had had this book 5 years ago when I was getting to grips with open source collaborative filtering recommenders!</p>
<p>P.S. This book fits clearly in the business model for open source Apache software &#8211; write great and useful software for free, but make the users pay for the documentation!  Which is only fair, I think, since $20 or so is not much at all for such a wealth of well-written software! The same can be said for Weka, whose 303 pages of software documentation still requires the book to be useful.</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/714/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=714&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/12/22/review-mahout-in-action/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.manning.com/owen/owen_cover150.jpg" medium="image" />
	</item>
		<item>
		<title>CISTI Sciverse Gadget App</title>
		<link>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/</link>
		<comments>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 18:06:26 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[CISTI]]></category>
		<category><![CDATA[Digital library]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=849</guid>
		<description><![CDATA[Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed a (beta) Sciverse gadget that gives searchers and researchers a window on CISTI&#8217;s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI&#8217;s search results from a database of over 20 million [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=849&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.applications.sciverse.com/action/appDetail/298702?zone=main&amp;pageOrigin=appGallery&amp;activity=display"><img class="alignleft  wp-image-850" title="SearchAtCISTI" src="http://synthese.files.wordpress.com/2011/12/searchatcisti.jpg?w=221&#038;h=76" alt="" width="221" height="76" /></a> Betwixt the jigs and the reels, and with the help of several people at CISTI and Elsevier, I developed <a href="http://www.applications.sciverse.com/action/appDetail/298702?zone=main&amp;pageOrigin=appGallery&amp;activity=display">a (beta) Sciverse gadget</a> that gives searchers and researchers a window on CISTI&#8217;s electonic collection by taking the search term entered in Elsevier Hub and providing them with CISTI&#8217;s search results from a database of over 20 million journal articles.</p>
<p><img class="alignleft  wp-image-852" title="SciverseApps" src="http://synthese.files.wordpress.com/2011/12/sciverseapps1.jpg?w=174&#038;h=82" alt="" width="174" height="82" /></p>
<p>Next year, I plan follow up with another Sciverse gadget for my <a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/">citation-based recommender</a> that uses the full power of<a href="http://developers.sciverse.com/api"> Elsevier&#8217;s API into its collection content</a>.</p>
<p>I want to commend all and sundry at Sciverse Applications for this initiative.  Opening up bibligraphic data and providing developers with a developer platform (a customized version of <a href="http://docs.opensocial.org/display/OS/Home">Google&#8217;s OpenSocial platform</a>) is exactly the right kind of thing to do both to benefit third parties (they get access to anotherwise closed and proprietary data) and to enhance their own search and discover environment.</p>
<p>There are, already, several advanced and interesting applications on Sciverse. My favourites are: <a href="http://www.applications.sciverse.com/action/appDetail/297955?zone=main&amp;pageOrigin=home&amp;activity=display">Altmetric</a> (winner of the Science Challenge prize &#8211; see YouTube demo video below) NextBio&#8217;s <a href="http://www.applications.sciverse.com/action/appDetail/292667?zone=main&amp;pageOrigin=appGallery&amp;activity=display">Prolific Authors</a> and Elsevier&#8217;s <a href="http://www.applications.sciverse.com/action/appDetail/292651?zone=main&amp;pageOrigin=appGallery&amp;activity=display">Table Download</a>.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='460' height='289' src='http://www.youtube.com/embed/zhtuBsQCLMw?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p>And there will be more to come. An open marketplace like this where the principles of variation and natural selection can operate will, I predict, make for a richer diversity of useful search and discovery tools than any single organization can develop on its own.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/849/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=849&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/12/13/cisti-sciverse-gadget-app/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/12/searchatcisti.jpg" medium="image">
			<media:title type="html">SearchAtCISTI</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/12/sciverseapps1.jpg" medium="image">
			<media:title type="html">SciverseApps</media:title>
		</media:content>
	</item>
		<item>
		<title>What is &#8216;Data&#8217;?</title>
		<link>http://synthese.wordpress.com/2011/06/14/what-is-data/</link>
		<comments>http://synthese.wordpress.com/2011/06/14/what-is-data/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 02:59:43 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Information retrieval]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=820</guid>
		<description><![CDATA[&#8220;What does &#8216;data&#8217; mean to you?&#8221; I asked innocently to various participants at JCDL 2011 today.  I had just come out of a very interesting panel discussion entitled &#8220;Big Data, Big Deal?&#8221; at which most of the discussion was about large amounts of proprietary text at http://www.hathitrust.org/ (some of of the discussion was also about large [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=820&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><img class="alignleft" src="http://images3.wikia.nocookie.net/__cb20061127074519/memoryalpha/en/images/1/13/Data%2C_2364.jpg" alt="" width="117" height="143" />&#8220;What does &#8216;data&#8217; mean to you?&#8221; I asked innocently to various participants at <a href="http://www.jcdl2011.org/">JCDL 2011</a> today.  I had just come out of a very interesting panel discussion entitled &#8220;Big Data, Big Deal?&#8221; at which most of the discussion was about large amounts of proprietary text at <a href="http://www.hathitrust.org/">http://www.hathitrust.org/</a> (some of of the discussion was also about large amounts of music in the <a href="http://salami.music.mcgill.ca/">SALAMI project at McGill</a>).</p>
<p>Now I am very interested in text, text retrieval (and music IR too) and I found the panel discussion most rewarding.  But it wasn&#8217;t <em>about</em>what I had been expecting it to be about (from the title) and I was perplexed by this use of the term &#8220;data&#8221; in this context. After all, the subtitle of the JCDL 2011 conference is &#8220;Bringing Together Scholars, Scholarship and Research Data&#8221;.  So the context for &#8220;data&#8221; was (for me) &#8220;research data&#8221; in the sense of the term that is pretty much the same the first 3 sentences of the <a href="http://en.wikipedia.org/wiki/Data">Wikipedia entry for Data</a>:</p>
<blockquote><p>The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data (plural of &#8220;datum&#8221;) are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which information and then knowledge are derived.</p></blockquote>
<p>So I was somewhat taken aback by the argument that ensued. Everyone, it seems (except me), was quite happy to speak of &#8220;Big data&#8221; and &#8220;large amounts of text&#8221; as synonymous.  As though the streams of bytes that are common to readings from an NMR spectrometer, digital music and electronic journal articles were in all significant respects indistinguishable.</p>
<p>Of course, large volumes of byte-sequences share some kinds of problems like storage, preservation and search. But &#8220;text data&#8221; is a different kind of beast, isn&#8217;t it? For one thing, text typically has meaning &#8211; cognitive content that is different from, say, music or images or spreadsheets of temperature variations in Glasgow over the past 500 years. It has more structure too, as evidenced by how efficiently it compresses and how (relatively) easy it is to search.</p>
<p>I&#8217;m happy to speak of data <em>about</em> text that is inferred by the act of mining text.  Word frequencies, ngrams, term clusters, sentiment categories etc. fit the definition of &#8220;data&#8221; above. Even the textual &#8220;meta-data&#8221; about text is data of a certain kind. But the text itself just doesn&#8217;t seem to be that kind of thing (qualitative or quantitative attributes of a variable).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/820/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=820&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/06/14/what-is-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://images3.wikia.nocookie.net/__cb20061127074519/memoryalpha/en/images/1/13/Data%2C_2364.jpg" medium="image" />
	</item>
		<item>
		<title>Learning from Watson</title>
		<link>http://synthese.wordpress.com/2011/02/19/learning-from-watson/</link>
		<comments>http://synthese.wordpress.com/2011/02/19/learning-from-watson/#comments</comments>
		<pubDate>Sat, 19 Feb 2011 19:57:05 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[Statistical Semantics]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=793</guid>
		<description><![CDATA[Now that Watson has convincingly demonstrated that machines can perform some natural language tasks more effectively than humans can (see a rerun of part of Day 1 of the Jeopardy contest), what is the proper conclusion to be drawn from it? Should we join hands with &#8220;confederates&#8221; like Brian Christian and rally against the invasion [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=793&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg"><img class="alignleft size-medium wp-image-794" title="IBM_Watson" src="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg?w=162&#038;h=162" alt="Watson" width="162" height="162" /></a>Now that <a href="http://www.ibm.com/innovation/us/watson/">Watson</a> has convincingly demonstrated that machines can perform some natural language tasks more effectively than humans can (see a <a href="http://www.youtube.com/watch?v=4PSPvHcLnN0">rerun of part of Day 1</a> of the Jeopardy contest), what is the proper conclusion to be drawn from it?</p>
<p>Should we join hands with &#8220;confederates&#8221; like Brian Christian and rally against the invasion of smart machines? (See his recent piece in the <a href="http://www.theatlantic.com/magazine/archive/2011/03/mind-vs-machine/8386/">Atlantic</a> and listen to his recent <a href="http://www.cbc.ca/day6/blog/2011/02/18/interview-forget-watson-this-is-the-real-test-of-ai/">radio interview on CBC</a>)?</p>
<p>Or do we conclude that machines are now (or soon will be) sentient and deserve to be spoken to with respect for their moral standing (see Peter Singer&#8217;s article &#8220;<a href="http://www.project-syndicate.org/commentary/psinger57/English">Rights for Robots</a>&#8220;)? Or should we, like<a href="http://www.nserc-crsng.gc.ca/Prizes-Prix/Herzberg-Herzberg/Profiles-Profils/Hinton-Hinton_eng.asp"> NSERC Gold Medal Award</a> winner <a href="http://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoffrey Hinton</a>,  be scared about the social consequences (in the long term) of intelligent robots designed replace soldiers (listen to his interview on the future of <a href="http://en.wikipedia.org/wiki/Artificial_intelligence">AI machines</a> on <a href="http://www.cbc.ca/video/news/audioplayer.html?clipid=1803608455">CBC&#8217;s Quirk and Quarks</a>).</p>
<p>Before coming to any definite conclusion about how &#8220;like&#8221; us machines can be, I think we should consider how these machines do what they do.  The <a href="http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf">survey paper in AI Magazine</a> about the design of &#8220;DeepQA&#8221; by the Watson team gives some indications of the general approach:</p>
<blockquote><p>DeepQA is a massively parallel, probabilistic evidence-based architecture. For the Jeopardy Challenge, we use more than 100 different techniques for analyzing natural language, identifying sources, ﬁnding and generating hypotheses, ﬁnding and scoring evidence, and merging and ranking hypotheses&#8230;.</p>
<p>The overarching principles in DeepQA are <em>massive parallelism</em>, <em>many experts</em>, <em>pervasive conﬁ-dence estimation</em>, and <em>integration of shallow and deep knowledge</em>.</p></blockquote>
<p>Is this the right model for creating artificial cognition? Probably not. As Maarten van Emden and I argue in a recent paper on the <a href="http://web.ncf.ca/andre/publications/ChineseRoomHumanWindow.pdf">chinese room argument and the &#8220;Human Window&#8221;</a>, the question of whether a computer is simulating cognition cannot be decided by how effectively a computer solves a chess puzzle (for instance) but rather by the mechanism that it uses to achieve the end.</p>
<p>In this instance DeepQA uses and combines a number of different techniques from NLP, machine learning, distributed processing and decision theory &#8211; which is not likely to be an accurate representation of what humans actually do but it is undeniably successful at that task (see<a href="http://www.youtube.com/watch?v=v5CPGMZteFQ&amp;feature=player_embedded"> this talk on YouTube</a> about how IBM addressed the Jeopardy problem).</p>
<p>Geoff Hinton (in the radio interview mentioned above) speculates that Watson is a feat of special-purpose engineering but that the general-purpose solution &#8211; a large neural network that simulates the learning abilities of the brain &#8211; is what the project of AI is really about.</p>
<p>What we suggest in our Human Window paper is that one criterion we can use to determine whether machines are performing adequate simulations of what humans do is whether or not humans are able to follow the steps that machine is undertaking. On that criterion, I think it&#8217;s safe to say that Watson &#8211; although very impressive &#8211; isn&#8217;t quite there yet.</p>
<p>P.S. If you have the patience, I recommend watching a <a href="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov">BBC debate</a> from 1973 between <a href="http://en.wikipedia.org/wiki/James_Lighthill">Sir James Lighthill</a>, <a href="http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)">John McCarthy</a> and <a href="http://en.wikipedia.org/wiki/Donald_Michie">Donald Michie</a> about whether AI is possible. The context of this video is the &#8220;Lighthill Affair&#8221; in 1972, recently <a href="http://vanemden.wordpress.com/2011/02/18/from-the-chronicles-of-scruffy-versus-neat-the-lighthill-affair/">chronicled on van Emden&#8217;s blog</a> (note that the audio on this thumbnail video is rather out of synch!).</p>
<p>It&#8217;s amazing how spectacularly wrong an amateur in artificial intelligence (Prof. Lighthill was an applied mathematician specializing in fluid dynamics) can be about the possibiliy of machines simulating intelligent behaviour. It is real tragedy that Sir Lighthill&#8217;s ideological biases had such disastrous consequences for AI research funding in the UK. The attitude of Sir Lighthill reminds me of <a href="http://en.wikipedia.org/wiki/Samuel_Wilberforce">Samuel Wilberforce</a>&#8216;s objections  to Darwin&#8217;s theory of evolution. I find it astonishing that this BBC debate was so civilized in its demeanour.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/793/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=793&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2011/02/19/learning-from-watson/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
<enclosure url="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov" length="169265179" type="video/quicktime" />
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2011/02/ibm_watson.jpg?w=300" medium="image">
			<media:title type="html">IBM_Watson</media:title>
		</media:content>
	</item>
		<item>
		<title>Mendeley Data vs. Netflix Data</title>
		<link>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/</link>
		<comments>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 01:05:30 +0000</pubDate>
		<dc:creator>Andre Vellino</dc:creator>
				<category><![CDATA[Citation]]></category>
		<category><![CDATA[Collaborative filtering]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Digital library]]></category>
		<category><![CDATA[Recommender]]></category>
		<category><![CDATA[Recommender service]]></category>

		<guid isPermaLink="false">http://synthese.wordpress.com/?p=766</guid>
		<description><![CDATA[Mendeley, the on-line reference management software and social networking site for science researchers has generously offered up a reference dataset with which developers and researchers can conduct experiments on recommender systems. This release of data is their reply to the DataTel Challenge put forth at the 2010 ACM Recommender System Conference in Barcelona. The paper published by computer scientists at Mendeley, [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=766&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.mendeley.com"><img class=" alignleft" src="http://www.mendeley.com/graphics/commonnew/logo-mendeley_1284377719.png" alt="" width="345" height="81" /></a></p>
<p><a href="http://www.mendeley.com/">Mendeley</a>, the on-line reference management software and social networking site for science researchers has generously offered up a <a href="http://dev.mendeley.com/datachallenge/">reference dataset</a> with which developers and researchers can conduct experiments on recommender systems. This release of data is their reply to the <a href="http://adenu.ia.uned.es/workshops/recsystel2010/datatel.htm">DataTel Challenge</a> put forth at the 2010 ACM Recommender System Conference in Barcelona.</p>
<p>The paper published by computer scientists at Mendeley, which accompanies the dataset (<a href="http://www.mendeley.com/research/sei-whale/">bibliographic reference</a> and <a href="http://www.mendeley.com/download/public/19900/3568420111/713e027c0c0b195d08f87da30f65bd668a3784a1/dl.pdf">full PDF</a>), describes the dataset as containing boolean ratings (read / unread or starred / unstarred) for about 50,000 (anonymized) users and references to about 4.8M articles (also anonymized), 3.6M of which are unique.</p>
<p>I was gratified to note that this is almost exactly the user-item ratio (1:100) that I  indicated in my <a href="http://goo.gl/Bc64">poster at ASIS&amp;T2010</a> was typically the cause of the data sparsity problem for  recommenders in digital libraries. If we measure the sparseness of a dataset by the number of edges in the bipartite user-item graph divided by the total number of possible edges, Mendeley gives 2.66E-05.  Compared with the sparsity of Neflix &#8211; 1.18E-02 &#8211; that&#8217;s a difference of 3 orders of magnitude!</p>
<p>But raw sparsity is not all that matters. The number of users per movie is much more evenly distributed in Netflix than the number of readers per article in Mendeley, i.e.  the user-item graph in Netflix is more connected (in the sense that the probability of creating a disconnected graph by deleting a random edge is much lower).</p>
<p>In the Mendeley data, out of the 3,652286 unique articles, 3,055546 (83.6%) were referenced by only 1 user and 378,114 were referenced by only 2 users. Less than 6% of the articles referenced were referenced by 3 or more users. [The most frequently referenced article was referenced 19,450 times!]﻿</p>
<p style="text-align:center;"><a href="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg"><img class="size-full wp-image-772  aligncenter" title="Mendeley-Articles" src="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg?w=460&#038;h=262" alt="" width="460" height="262" /></a></p>
<p>Compared with the Netflix dataset (which contains over ~100M ratings from ~480K users on ~17k  titles) over 89% of the movies in the Netflix data had been rated by 20 or more users. (See <a href="http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/">this blog post</a> for more aggregate statistics on Netflix data.)</p>
<p style="text-align:center;"><a href="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg"><img class="alignnone size-full wp-image-789" title="Netflix-movies1" src="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg?w=460&#038;h=243" alt="" width="460" height="243" /></a></p>
<p>I think that user or item similarity measures aren&#8217;t going to work well with the kind of distribution we find in Mendeley data. Some additional information such as article citation data or some content attribute such as the categories to which the articles belong is going to be needed to get any kind of reasonable accuracy from a recommender system.</p>
<p>Or, it could be that some method like the heat-dissipation technique introduced by physicists in the paper &#8220;<a href="http://doc.rero.ch/lm.php?url=1000,43,2,20100318115452-QH/zha_sad.pdf">Solving the apparent diversity-accuracydilemma of recommender systems</a>&#8221; published in the Proceedings of the National Academy of Sciences (PNAS) could work on such a sparse and loosely connected dataset. The authors claim that this approach works especially well for sparse bipartite graphs (with no ratings information). We&#8217;ll have to try and see.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/synthese.wordpress.com/766/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/synthese.wordpress.com/766/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=synthese.wordpress.com&#038;blog=666986&#038;post=766&#038;subd=synthese&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://synthese.wordpress.com/2010/11/02/mendeley-data-vs-netflix-data/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/8e2e3a01bf33747391457d97e0df832b?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vellino</media:title>
		</media:content>

		<media:content url="http://www.mendeley.com/graphics/commonnew/logo-mendeley_1284377719.png" medium="image" />

		<media:content url="http://synthese.files.wordpress.com/2010/10/mendeley-articles.jpg" medium="image">
			<media:title type="html">Mendeley-Articles</media:title>
		</media:content>

		<media:content url="http://synthese.files.wordpress.com/2010/11/netflix-movies1.jpg" medium="image">
			<media:title type="html">Netflix-movies1</media:title>
		</media:content>
	</item>
	</channel>
</rss>
