jump to navigation

Protecting Yourself from Spies September 7, 2013

Posted by Andre Vellino in Ethics, Human Rights, Information.
add a comment


I once worked for a company that makes the kind of software that the NSA and CSIS appear to be using to monitor email and internet metadata (see the Guardian for a quick survey of the metadata that exists in different digital media).

I might add that I think there is nothing morally wrong with the surveillance technology itself – indeed it can be used to protect privacy and prevent harm. It is more a question of whether our privacy rights are violated when the technology is used and whether those rights should be relinquished to the state for the greater good.

The recent revelation that the presumption of privacy even when engaging in encrypted transactions is erroneous adds fuel to my concern that people don’t make informed decisions about what information they disclose and that they don’t even try to protect their information even when it is quite easy to do. This post highlights some software solutions you can use to reduce the likelihood that your private information is monitored.

Web Browsing

Let’s start with web browsing. The amount of information that a web servers can glean from your web browser’s attempt to connect with it is quite voluminous. To see what a server can find out about your browser and computer, try this link:


Furthermore, the combination of these browser characteristics, while they may not provide personal identity information can still identify you uniquely.  Try this test from the Electronic Frontier Foundation:


When I try it, they assert that my browser information-collection, i.e. my browser “fingerprint” is unique among the 3M or so they have tested.

There is not much you can do to limit the uniqueness of your browser’s fingerprint other than having a generic computer and a generic browser configuration.  Using the TOR browser / network (see below) helps to reduce the uniqueness of your browser-fingerprint, but there are tradeoffs (response speed for one thing).


There was a time when I thought that HTTP-Secure (“https”) was a reliable way of ensuring that information between your browser and the end-point server (e.g. a Bank) could not be intercepted or tampered with. The revelation that the NSA is able to decrypt such communications reduces my confidence that this method is “secure” in any meaningful way, but at least it offers some degree of assurance that not just anybody and either read or tamper with such transactions.

If that level of confidence is sufficient for you, then you might consider adding the HTTPS Everywhere plugin (brought to you by the Electronic Freedom Foundation) to your browser.


This browser / encrypted network system describes itself as

…free software and an open network that helps you defend against a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security

In principle, the Onion Routing technology behind it offers the end-user a high degree of anonymity and untraceability. However, if anyone can break SSL, the next step is to break TOR.

File and file system encryption

If you want to protect computer files, or indeed a whole file system (e.g. in case your laptop is stolen or your USB key is lost) you should try TrueCrypt. It offers operating-system level, on-the fly encryption, file-level encryption and partition encryption.  Best of all, TrueCrypt is open source (so you can check for yourself, if you have the patience and know-how, that there are no backdoors for the NSA or CSIS).

Also, for Windows PCs (or Wine enabled Macs), AxCrypt is a pretty good and easy to use tool for encrypting files.


Securing email is a bit trickier. There is no meaningful way to encrypt e-mail metatdata. The very nature of e-mail addressing and store-and-forward protocols like SMTP require that metadata. Which, of course, is a fundamental design flaw with email.

However, if you want to protect the content of what you say from prying eyes, you can try Gnu Privacy Guard (GPG). Its precursor was PGP (Pretty Good Privacy) and Edward Snowden thinks it works.


It appears that most people think that their privacy is worth sacrificing in exchange for safety and protection by government.  This is short-sighted. A benevolent government in whose integrity you trust might do the right thing at any point in time, but the issue is a matter of principle. You should not relinquish your right to privacy to the state.

As Bruce Schneier wrote in The Guardian:

By subverting the internet at every level to make it a vast, multi-layered and robust surveillance platform, the NSA has undermined a fundamental social contract…..

We have a moral duty to [dismantle the surveillance state], and we have no time to lose.

In the meantime we can at least do better to protect ourselves.

Steve Jobs was Right about AppleTV UI April 22, 2012

Posted by Andre Vellino in Information, User Interface.
1 comment so far

AppleInsider reported a few weeks ago that Steve Jobs rejected – as long as 5 years ago – the newly introduced Apple TV user interface. Predictably, Steve was right: the new UI for AppleTV has some major flaws in not just one but several dimensions: usability, cognitive modeling and information organization.

Consider this snapshot of the old UI:

The top third of the screen is reserved for image thumbnails that correspond to offerings in the highlighted service.  The remote’s navigation buttons change only the horizontal and vertical menu choices and the menus correspond to the categories of services available. [The top-level thumbnails are also accessible to get to the item directly.]

Admittedly there are some problems with this way of organizing the user’s entertainment options.  One is that the top level categories are not all the same kind of thing.  “Internet” is a mode of delivery (which, of course, is also the mode of delivery for the rest of AppleTV content), whereas the others are descriptive of the kind of objects that are below the main menu item. What “Internet” means, clearly, is “other, non-apple applications”.  In addition, more recent AppleTV top-level menus also has the “Computer” category, meaning “Content streamed for your local computer running iTunes”, adding a second source-centered category.

However, at least the old interface makes some attempt at grouping content. Furthermore, the interface for the top-level navigation resembles in structure the navigation system implemented for each of the applications.  The interface has the consistency hallmark of Apple interfaces generally: learn the interface for one application and you know (more or less) how all the others behave.

Contrast this with the new interface.  In some respects, it is similar to the old one – thumbnails of content-images appear at the top of the screen, as expected and the content sources are more or less the same.

However, the artificial segregation by source or kind is eliminated altogether: all the applications on the same footing, iPad-App style.

The first serious problem starts manifesting when you scroll just one line down: the 1/2-page sized thumbnails disappear altogether.  Yet the selected applications (I bet) are still generating those thumbnails – you just can’t see them any more.

Right away, this gives screen real estate dominance to the first row of applications – Apple iTunes applications, naturally. Furthermore, you can’t go straight to the items in the thumbnails because you can’t see them any more.

The second major flaw comes from the mixed-mode cognitive models.  The first-level application-selection mode is (vaguely) iPad-like (without the ability to group apps, rearrange them or create screen-pages). However, once you’ve selected an application you’re back to the (more familiar and sensible) menu-navigation system.

What’s worse, though, is that the menu system for each application is now no longer consistent.  “Movies” (short for “iTunes Movie Store”) has a Mac-style top-level menu-bar rather than a right-side menu navigation bar like all the other applications. Gone is the consistent Apple look-and-feel.

If at least the user had the ability to group applications as they see fit and to delete the unwanted ones (why not, the iPod/iPad allows that?).

Theres just no doubt about it.  Steve was right.

Scientific Research Data August 23, 2010

Posted by Andre Vellino in Data, Information, Information retrieval, Open Access.

Scientific research data is, without a doubt, a central component in the lifecycle of knowledge production. For one thing, scientific data is critical to the corroboration (or falsification) of theories. Equally important to the process of scientific inquiry is making this data openly available to others – as is vividly demonstrated by the so-called “ClimateGate” controversy and the more recent cloud on Marc Houser’s research data on primate cognition. The public accessibility of data enables open peer review and encourages the reproducibility of results.

Hence the importance of data management practices in 21st century science libraries: the curation of, access to and preservation of scientific research data set will be critical to the future of scientific discourse.

It is true that “Big Science” has been in the business of curating “reference data” for years. Institutional data centers in many disciplines have been gathering large amounts of data in databases that contain the fruit of years of research. GenBank, for instance, is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (containing over 150,000 sequence records.)

However, other kinds of data gathered by scientists are either transient or highly context-dependant and are not being preserved for the long term benefit of future research either by individuals or by institutions. This might not be so serious for those data elements that are reproducible – either by experiment or simulation – but much of it, such as data on oil-content and dissipation rates in the Gulf of Mexico water column in 2010, is uniquely valuable and irreproducible.

As I indicated in a previous post, one development that will help redress the problems endured by small, orphaned and inaccessible dataset is the emergence of methods of uniquely referencing datasets such as the data DOIs that are being implemented by DataCite partners.  The combination of data-deposit policies by science research funding agencies (such as NSF in the US and NSERC in Canada) and peer-recognition from university faculty for contributions to data repositories, data publication and referencing will soon grow to match the present status of scholarly publications.

In parallel, the growing “open access for Data” movement and other initiatives to increase the availability of data generated by government and government-funded institutions (including NASA, the NIH and the World Bank are now well underway in a manner consistent with the OECD’s principles, which, incidentally, offers a long and convincing list of economic and social benefits to be obtained from making accessible scientific research data.

In particular, the United States , the UK and Australia are spearheading the effort of making public and scientific research data more accessible. For instance, in the U.S., the National Science and Technology Council (NSTC)’s recent report to President Obama details a comprehensive strategy to promote the preservation of and access to digital scientific data.

These reports and initiatives show that the momentum is building globally to realize visions that have been articulated in principle by several bodies concerned with the curation and archiving of data in the first decade of the 21st century (see To Stand the Test of Time and Long Lived Scientific Data Collections).

In Canada, several similar reports such as the Consultation on Access to Scientific Research Data and the Canadian Digital Information Strategy also point to the need for the national stewardship of digital information, not least scientific data sets. Despite much discussion, systematic efforts in the stewardship of Canadian digital scientific data sets are still only at the preliminary stages.  While there are well managed and curated reference data in domains such as earth science (Geogratis) and Astronomy (Canadian Astronomy Data Centre) which have a community of specialist scientific users and whose needs are generally well met, the data-management needs of individual scientists in small, less well funded research groups is either impossible to find or lost.

One impediment to the effective bibliographic curation of data sets is the absence of common standards. There are currently “no rules about how to publish, present, cite or otherwise catalogue datasets.” [Green, T (2009), “We Need Publishing Standards for Datasets and Data Tables”, OECD Publishing White Paper, OECD Publishing]

CISTI’s Gateway to Scientific Data sets and other such national sites (e.g. the British National Archives of Datasets) that aggregate information about data sets, use bibliographic standards (e.g. Dublin Core) for representing meta-data.  The advantage is that these standards are not domain-dependant yet sufficiently rich to express the core elements of the content needed for archiving storage and retrieval.  However, these metadata standards, developed for traditional bibliographic purposes, are not (yet) sufficiently rich to fully capture the wealth of scientific data from all disciplines, as I argued in a previous post.

One of the major concerns when deciding on the feasibility of creating a data repository is the cost associated with the deposit, curation and long-term preservation of research data. Typically, costs depend on a variety of factors including how each of the typical phases (planning, acquisition, disposal, ingest, archive, storage, preservation and access services) are deployed (see the JISC reports “Keeping Research Data Safe” Part 1 and Part 2). The costs associated with different data collections are also likely to vary considerably according to how precious (rare/valuable) the stored information is and what the requirements are for access over time.

One point to note from the “Keeping research data safe” reports commissioned for JISC is that

“the costs of archiving activities (archival storage and preservation planning and actions) are consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access.”

In short – librarianship for datasets is critical to the future of science and technology costs are the least of our concerns.

E-Books Revisited April 4, 2010

Posted by Andre Vellino in Information, User Interface.
1 comment so far

As Canadians await their iPads for the end of April this funny SpeedBump cartoon makes two serious points worth noting: e-book readers have poor screen resolution and digitization degrades the quality of information.

There are obvious advantages to digital information, the top three being indexing (hence search and discovery) and ease of storage and distribution. But, just as the artifacts of MP3 encoding has changed the production of music (e.g. music produced intentionally with less dynamic range, more pronounced basses and trebles), so the advent of  (relatively) low-resolution (~150DPI) monochrome (e-ink) or (~132 DPI for the iPad) color LED display devices threatens to constrain the consumption of content – scholarly journal articles especially.

At least in the short term. It isn’t until we can do “Seadragon“-like things – things that augment the dimensionality of textually and graphically represented knowledge that electronically published and displayed information has any chance of surpassing paper.   Imagine looking at a photograph and being able to find out much, much more about it  than the human eye can possibly detect, e.g. via NRC’s 3-D digital imaging of the Mona Lisa.

So it is possible to imagine a great future for the scholarly use of iPad-like devices. But as an instrument for the mere reading of text, e-book readers still have a long way to go.

Why I Bought a CD Player December 31, 2009

Posted by Andre Vellino in Attention, Information, Open Source.
1 comment so far

My trusted 1992 NAD CD player decoded its last bits a few months ago. I wanted to replace it with something – the question was: what?

In the 21st century, one would think that the obvious answer to this question is a networked media player like the Sonos Digital Music System or the Logitech Squeezebox Duet. Music is digital now – why not store it on a server and play it back on a special purpose computer?  After all, isn’t that what even an old fashioned CD player is already?

I ended up choosing another CD player (a Marantz CD5300 to be exact) instead of a networked player. I was influenced in part by my spouse’s preference for handling tangible things (CDs). I agree with her that there’s something about taking a disc and playing it that makes the listener less “remote” from the music / composer / performer  than searching / navigating / browsing a collection of files. As well, I think selecting a disc requires a greater degree of purposeful intention for listening attentively than the selection of a play-list on an iPod-like device.  Moreover, an “album” isn’t just a random collection of songs or tracks – it is, itself, a composition of sorts. This is all the more obvious with classical music, where the unit-to-be-listened-to isn’t the movement but the Concerto or the Symphony.

One reason I considered the networked player at all is that I wanted to get rid of the clutter of a CD collection.  But the problem with ripping CDs to disk is dealing with the meta-data.  Services like the free freedb or even the commercial Gracenote are great for finding the (likely) metadata (title / composer / performer) that corresponds to your disc, but often the information offered by these services is incorrect or inconsistently specified.  There are also many redundant entries.  My experience is that the physical clutter and organization problem just gets replaced by a digital clutter and organization problem.

There are, also, the issues of quality of sound and predictability of format. Say what you will, but there is a perceptible difference between a 320kb/s MP3 encoding and the WAV data from which it was extracted. My benchmark is simple: does the MP3 encoding of the second movement of Schostakovich’s 10th symphony still send a chill up my spine? – it doesn’t.

One may retort: why don’t you use a format like FLAC, which offers 2x compression with no loss of information? One reason is – the WAV format on compact discs has been around for 30 years, FLAC only 10. And, truth be told, I have the nagging suspicion that music monopoly machines like iTunes will relegate Open Source formats like FLAC and the excellent (lossy) compression fomat OGG to oblivion.

There, I’m out of the closet as a retro-techno-laggard who doesn’t believe that WAV is an improvement over analog vinyl nor that compression formats are an improvement over16bit PCM encoding. I do listen to music in FLAC and OGG on one of the few portable players that supports them (the wonderful and inexpensive Sansa Clip), but I’m still keeping my CDs for the sake of posterity.

The (long tail) End of the Book December 4, 2009

Posted by Andre Vellino in CISTI, Digital library, General, Information.

walrusI would venture to guess that Noah Richler (son of Mordecai) is not the first journalist to predict the demise of the book.  In his article in the October edition of The Walrus, Richler says:

….the book industry’s digital future, one that was comfortably far off even as the music industry was being decimated, is now ineluctably and forcefully here.

Richler asserts that instead of propagating a greater variety of books to the general public via the effect of the long tail, the web has benefited blockbusters:

Rather than connecting the public to the glorious cornucopia of the Long Tail, the effect of the web has been to serve fewer blockbusters better.

Worse, he says:

Today, just a small number of books compete for consumers’ fleeting attention. The tail is longer, but it is also thinner.

Exacerbating the problem for the publishing industry is the increasing amount of  free content, especially via Google Books:

By copying the books first and negotiating later, Google has, in effect, established itself as the biggest pirate in the world.


…digital is where the puck is going to be, and publishers have no choice but to skate toward it.

True enough, but “digital” also means “ephemeral” – which is fine for Robert Ludlum novels, but less so, perhaps for the Origin of Species and other milestones in the history of human knowledge.

Contra some bloggers who predict the end of the book by pointing to the growing sales of e-Readers like the Kindle, I think there’s a very important difference between the playback of video or  music on iPods and the reading of books on e-Readers.

First video and audio have to be viewed or heard with some kind of playback device (cd player / tape player / video player).  Not so with print. You can read a paper book with no playback device and doing so has no up-front device costs (Kindle / iPod), no power requirements and it is invulnerable to electro-magnetic pulses. Furthermore paper books already have all the right digital rights management (DRM) mechanisms built-in: you can borrow them, you own them and you can re-sell them.

Second, the typical “unit” in the audio industry is the 5-minute song. The typical “unit” in literature (i.e. book) is ~ 300 pages.  The amount of time and attention required to “consume” a book is several orders of magnitude greater than a song. This makes a huge difference to how human beings like to ingest this content.  I can spend hours reading a book (at 600 dots per inch) – less on a low-resolution device like a Kindle.  Audio and video are much better, by comparison.

Last but not least is the impermanence inherent in digital formats and media. Entrusting our knowledge to digital formats (e.g. PDF) and media (e.g. hard disks) commits us to an ephemeral cultural and intellectual memory. While digitization has its virtues (e.g. searching, social tagging, clustering) it also harbours (invisible) dangers (e.g. digital rot, ease of forgery, dependence on rapidly changing software and hardware developed at great expense in the private sector.)

The inherent conservatism in librarianship that values, organizes and manages paper is a welcome counterweight to the near-term myopia of digital early-adoption.  Both have their place in the 21st century but I worry about our putting all our eggs in the digital basket.