DISAPPEARING DATA….Late Night Thoughts has a post today about one of my favorite subjects, the transient nature of information today:
In the late 1970s, the Census Bureau discovered that the aggregated data from the 1960s Census could be read only using an UNIVAC Type II-A tape drive. At the time, there were only two of those in existence: one in Japan, and one in the Smithsonian Museum! A massive data rescue effort was mounted and by 1979 the data had been recovered….
Back when I was in the document imaging business this was a well-known but rarely mentioned problem. Instead, when the topic of data storage came up, it was usually treated as a purely technical subject: magnetic tape starts to deteriorate in 10 years, for example, while an optical disk has a lifetime of 100 years.
Physical media capabilities are important, but even more important is the logical structure of the data. If you wrote a manuscript on an 8″ floppy on a TRS-80 Model II twenty years ago, it wouldn’t matter if the integrity of the floppy disk was still OK. And even if you somehow dug up an old Model II somewhere, you’d need to have a copy of Scripsit, the word processor of choice for TRS-80s. And even if you found that, and somehow managed to transfer the data over a serial port (thank God for RS-232!), you’d still have a file that was unreadable on any modern PC.
For anyone who cares about preserving data for more than a decade or two ? a librarian like Emma, for example ? this is a huge problem. Even if you do a good job of recopying data every decade or so onto fresh media, what are the odds that the files themselves can still be read? Will JPEG still be an image standard in 2030? How about HTML? Or even ASCII?
The document imaging industry is dedicated to bringing about the paperless office, but the oldest joke in the business is that the paperless office will arrive at about the same time as the paperless bathroom. All things considered, that’s probably a good thing.
POSTSCRIPT: My example above was not chosen at random: a few years ago I faced exactly that problem with some old TRS-80 files. My solution? Luckily I had paper copies, so I scanned ’em and used OCR to read the text. If I hadn’t had the paper copies, I would have been completely up the creek.