After my post about dirty little secrets a few weeks ago, Joe Martins from Data Mobility Group wrote to point out the real “dirty little secret” about archive systems: even if your archival storage is reliable, it doesn’t mean you can do anything useful with your data once to retrieve it in the distant future.
There’s more to a digital archive than just being able to store and retrieve your bits from media. If your storage system has been designed properly then it will give you your data, but it won’t necessarily give you the information that data represents. For several years I co-chaired the SNIA Long-Term Archive and Compliance Storage Initiative, and this was a problem that we frequently considered. The challenges found when considering how to solve this problem led in part to the development of XAM, the new eXtensible Access Method standard for object-based information storage.
When it comes time to retrieve and process data that was written a long time prior, there are two major challenges — what I like to call physical readability and logical readability. Physical readability means that the archive system is able to retrieve and present the exact bitstream that was originally written, intact, complete, with no errors. Logical readability, on the other hand, means that I am able to extract the same semantic meaning from those bits as when they were originally processed. The first problem is one that can be solved purely by technology; the second one, sadly not. (more…)