Permabits and Petabytes

December 11, 2008

No Silver Bullet: Format Best Practices

Filed under: Jered Floyd — jeredfloyd @ 3:47 pm

In the first post of this series, I introduced the concepts of physical versus logical readability and explained how getting back your bits in 100 years is a hard problem, but one with solid product and technology solutions. Last post, I explained why there’s no simple solution to being able to turn those bits back into information, but there are ways through careful planning to avoid the pitfalls.

So how can you solve the logical readability problem? Primarily by following best practices for data format preservation. Some best practices: (more…)


December 9, 2008

Cutting Costs With Enterprise Archive

Filed under: Jered Floyd — jeredfloyd @ 3:27 pm

A few things I’ve spotted around the web: Tony Pearson was at the Gartner Data Center Conference last week in Las Vegas; we were there too and it was an absolutely fantastic show. I didn’t get to go, but the reports I have back are that it was full of people who were fanatical about saving money on their storage, not just concerned with where the next steak dinner is at the show.

The best quote that Tony provides is from a lunch talk: (more…)

December 5, 2008

No Silver Bullet: Logical Readability

Filed under: Jered Floyd — jeredfloyd @ 5:32 pm

In my last post in this series I introduced the concepts of physical versus logical readability and explained how getting back your bits in 100 years is a hard problem in itself but is not alone sufficient for a complete archive. Accurately being able to store and retrieve bits — maintaining physical readability — over a long period of time is critical to an archive, as is being able to do so cost effectively, but is not enough. Logical readability, the ability to interpret what those bits mean, must be maintained as well, and this is a much harder problem that cannot be solved by technological means alone.

Modern electronic storage consists of binary data, ones and zeros. The physical encodings are complex and analog in nature and change frequently with advances in technology, but the data represented is always binary. This has not always been the case, as in the analog tapes from the Lunar Orbiter that I wrote about last time, but for fundamental mathematical reasons data is almost certain to be binary representable going forward. Storing and retrieving a bitstream is the physical readability challenge. (more…)

December 3, 2008

No Silver Bullet: Archive Challenges

Filed under: Jered Floyd — jeredfloyd @ 5:40 pm

After my post about dirty little secrets a few weeks ago, Joe Martins from Data Mobility Group wrote to point out the real “dirty little secret” about archive systems: even if your archival storage is reliable, it doesn’t mean you can do anything useful with your data once to retrieve it in the distant future.

There’s more to a digital archive than just being able to store and retrieve your bits from media. If your storage system has been designed properly then it will give you your data, but it won’t necessarily give you the information that data represents. For several years I co-chaired the SNIA Long-Term Archive and Compliance Storage Initiative, and this was a problem that we frequently considered. The challenges found when considering how to solve this problem led in part to the development of XAM, the new eXtensible Access Method standard for object-based information storage.

When it comes time to retrieve and process data that was written a long time prior, there are two major challenges — what I like to call physical readability and logical readability. Physical readability means that the archive system is able to retrieve and present the exact bitstream that was originally written, intact, complete, with no errors. Logical readability, on the other hand, means that I am able to extract the same semantic meaning from those bits as when they were originally processed. The first problem is one that can be solved purely by technology; the second one, sadly not. (more…)

Blog at