Visualizing Netflix Rental Patterns January 10, 2010
Posted by Andre Vellino in Recommender service, User Interface, Visualization.1 comment so far
The recent NY Times mashup of Netflix rental data with geographical data based on postal-codes illustrates just how informative such visualizations can be.
Take for instance the distribution of rentals in Washington DC of the movie Milk – based on the true story of Harvey Milk, the American gay activist who fought for gay rights and became California’s first openly gay elected official…
… and compare that with the distribution of rentals for The Proposal – a (straight) romantic comedy.
I think you could be forgiven for concluding that residents in the downtown core of Washington DC are more socially liberal than in its residential suburbs (or, of course, that downtown residents prefer serious historical dramas to fictional comedies – or both).
Imagine if you could do the same thing with labeled Bayesian or LSA models that characterize classes or intersections of classes of Netflix users (e.g. class types that might be labeled something like “highly-educated-and-well-paid-government-employee” vs. “unemployed-manufacturing-blue-collar-worker”). That could form the basis of a nice explanation interface to a movie recommender system.
Nexus One January 5, 2010
Posted by Andre Vellino in Information retrieval, User Interface.1 comment so far
I can’t say I care much that the new Google Phone “Nexus One” is not available in Canada. Nor even that Google has a phone to sell for that matter.
What I am impressed by, though, is the marketing department’s (yes, them again) ”virtual tour” of this new device.
http://www.google.com/googlephone/tour/
They succeed at giving you a really good impression of what it looks like, how it would feel in your hand, how you would use it and to induce in the potential customer an almost tangible desire to own one.
This new gadget must be the only device to support all of AAC, MP3, Ogg Vorbis and WAV. Why didn’t they throw in WMA while they were at it, I wonder?
Chrome OS January 4, 2010
Posted by Andre Vellino in Collaborative filtering.1 comment so far
You’ve got to marvel at the sheer genius of Google’s marketing arm for making such a compelling video out of such an old idea.
Remember the Network Computer (1997) or the Thin Client (1993)?
This time it may work, of course, now that we have the required bandwidth to the home.
Why I Bought a CD Player December 31, 2009
Posted by Andre Vellino in Attention, Information, Open Source.1 comment so far
My trusted 1992 NAD CD player decoded its last bits a few months ago. I wanted to replace it with something – the question was: what?
In the 21st century, one would think that the obvious answer to this question is a networked media player like the Sonos Digital Music System or the Logitech Squeezebox Duet. Music is digital now – why not store it on a server and play it back on a special purpose computer? After all, isn’t that what even an old fashioned CD player is already?
I ended up choosing another CD player (a Marantz CD5300 to be exact) instead of a networked player. I was influenced in part by my spouse’s preference for handling tangible things (CDs). I agree with her that there’s something about taking a disc and playing it that makes the listener less “remote” from the music / composer / performer than searching / navigating / browsing a collection of files. As well, I think selecting a disc requires a greater degree of purposeful intention for listening attentively than the selection of a play-list on an iPod-like device. Moreover, an “album” isn’t just a random collection of songs or tracks – it is, itself, a composition of sorts. This is all the more obvious with classical music, where the unit-to-be-listened-to isn’t the movement but the Concerto or the Symphony.
One reason I considered the networked player at all is that I wanted to get rid of the clutter of a CD collection. But the problem with ripping CDs to disk is dealing with the meta-data. Services like the free freedb or even the commercial Gracenote are great for finding the (likely) metadata (title / composer / performer) that corresponds to your disc, but often the information offered by these services is incorrect or inconsistently specified. There are also many redundant entries. My experience is that the physical clutter and organization problem just gets replaced by a digital clutter and organization problem.
There are, also, the issues of quality of sound and predictability of format. Say what you will, but there is a perceptible difference between a 320kb/s MP3 encoding and the WAV data from which it was extracted. My benchmark is simple: does the MP3 encoding of the second movement of Schostakovich’s 10th symphony still send a chill up my spine? – it doesn’t.
One may retort: why don’t you use a format like FLAC, which offers 2x compression with no loss of information? One reason is – the WAV format on compact discs has been around for 30 years, FLAC only 10. And, truth be told, I have the nagging suspicion that music monopoly machines like iTunes will relegate Open Source formats like FLAC and the excellent (lossy) compression fomat OGG to oblivion.
There, I’m out of the closet as a retro-techno-laggard who doesn’t believe that WAV is an improvement over analog vinyl nor that compression formats are an improvement over16bit PCM encoding. I do listen to music in FLAC and OGG on one of the few portable players that supports them (the wonderful and inexpensive Sansa Clip), but I’m still keeping my CDs for the sake of posterity.
The (long tail) End of the Book December 4, 2009
Posted by Andre Vellino in CISTI, Digital library, General, Information.2 comments
I would venture to guess that Noah Richler (son of Mordecai) is not the first journalist to predict the demise of the book. In his article in the October edition of The Walrus, Richler says:
….the book industry’s digital future, one that was comfortably far off even as the music industry was being decimated, is now ineluctably and forcefully here.
Richler asserts that instead of propagating a greater variety of books to the general public via the effect of the long tail, the web has benefited blockbusters:
Rather than connecting the public to the glorious cornucopia of the Long Tail, the effect of the web has been to serve fewer blockbusters better.
Worse, he says:
Today, just a small number of books compete for consumers’ fleeting attention. The tail is longer, but it is also thinner.
Exacerbating the problem for the publishing industry is the increasing amount of free content, especially via Google Books:
By copying the books first and negotiating later, Google has, in effect, established itself as the biggest pirate in the world.
Nevertheless,
…digital is where the puck is going to be, and publishers have no choice but to skate toward it.
True enough, but “digital” also means “ephemeral” – which is fine for Robert Ludlum novels, but less so, perhaps for the Origin of Species and other milestones in the history of human knowledge.
Contra some bloggers who predict the end of the book by pointing to the growing sales of e-Readers like the Kindle, I think there’s a very important difference between the playback of video or music on iPods and the reading of books on e-Readers.
First video and audio have to be viewed or heard with some kind of playback device (cd player / tape player / video player). Not so with print. You can read a paper book with no playback device and doing so has no up-front device costs (Kindle / iPod), no power requirements and it is invulnerable to electro-magnetic pulses. Furthermore paper books already have all the right digital rights management (DRM) mechanisms built-in: you can borrow them, you own them and you can re-sell them.
Second, the typical “unit” in the audio industry is the 5-minute song. The typical “unit” in literature (i.e. book) is ~ 300 pages. The amount of time and attention required to “consume” a book is several orders of magnitude greater than a song. This makes a huge difference to how human beings like to ingest this content. I can spend hours reading a book (at 600 dots per inch) – less on a low-resolution device like a Kindle. Audio and video are much better, by comparison.
Last but not least is the impermanence inherent in digital formats and media. Entrusting our knowledge to digital formats (e.g. PDF) and media (e.g. hard disks) commits us to an ephemeral cultural and intellectual memory. While digitization has its virtues (e.g. searching, social tagging, clustering) it also harbours (invisible) dangers (e.g. digital rot, ease of forgery, dependence on rapidly changing software and hardware developed at great expense in the private sector.)
The inherent conservatism in librarianship that values, organizes and manages paper is a welcome counterweight to the near-term myopia of digital early-adoption. Both have their place in the 21st century but I worry about our putting all our eggs in the digital basket.
DeepDyve’s iTunes Business Model October 27, 2009
Posted by Andre Vellino in Digital library, Information retrieval, Search.add a comment
DeepDyve appears to have adopted an iTunes-like buisness model ….$0.99 rentals for scientific research articles!
I like many things about the search engine – the way one can enter entire paragraphs of text as a query block, for instance. You can use that feature in PLoSONE (Public Library of Science), though it is not obviously available on the DeepDyve site itself.
I don’t think that knowledge can be commodified in this way, though. It doesn’t look like it is going to be sustainable business model.

