Permabits and Petabytes

August 16, 2008

Two Types of Archives

Filed under: Jered Floyd — jeredfloyd @ 4:15 pm

Blocks and Files has rapidly become one of my favo(u)rite sources of daily storage news, partly due to content and party due to the understated, cynical British humo(u)r that pervades, continuing the tradition of more general tech news sites like The Register and The Inquirer.

Most recently they have published two related articles on the maturation of archive technology in the enterprise, and I think both are pretty much spot on. The first, “Archive Layer Cake” by editor Chris Mellor, primarily highlights the vertical integration of the technologies that bring data from primary source to archival residence and the second, “The Evolution of the Archive” by Plasmon marketing directory Steve Tongish, elaborates on the horizontal archive consolidation Chris touches on in the first. Both are trends we are also seeing here at Permabit. We agree with these views of the future of archiving, as evidenced by our recent partnership with archive software vendor Atempo.

Archives v. Backups

As Mellor points out early on in his piece, archive is fundamentally not backup, or rather, backup makes a very poor archive. Backups exist primarily for disaster recovery purposes. The industry calls nearly everything related to reliability by the name “data protection”, as in protection against “oops, my disk crashed” or “oops, I deleted last year’s financial reports”, but in reality there are many sub-components. Data written to backups are there for user errors or catastrophic events, and the general expectation is that data will never be read. In most environments these data are treated with a matching level of priority, which of course leads to unpleasant surprises when the data is actually needed…

Archives, on the other hand, are used to preserve the last and final copy of data that perhaps was previously stored elsewhere. There’s no guarantee that archived data will ever be retrieved, but when it’s needed it becomes critical and it’s unlikely to be recovered from another source. As George Crump describes in “The Enterprise Archive Defined”, the requirements in terms of reliability, accessibility and scalability vary greatly from the backup use case. The growing understanding that archive and backup are different is leading to centralized archive initiatives within many large businesses.

Now that we’ve settled out that enterprise archives are a different animal, both authors point to a need to manage data more holistically, considering its origin and metadata for long term management. Here I will both agree and disagree — I think there are two types of customers.

To Classify or Not?

Tongish points early on at the fabled data classification solutions that will seek out and migrate data to an archive tier. I’ve yet to see this Loch Ness Migrator. We used to call this all Information Lifecycle Management (ILM) before vendors spent through the currency that term was worth. ILM ended up as an excuse for vendors to sell more of their storage, and then bundle millions of dollars of professional services on top in the name of eventually saving you money. By and large, this failed. Old-fashioned HSM works just fine, but the magic pixie dust of ILM hasn’t performed. Data classification, at a per record level, is expensive and complicated. It’s like waxing a car after every drive to try and improve your fuel economy — the gains are overwhelmed by the expense.

Don’t take this to mean that data movers are useless! Quite the contrary, I believe strongly in vertical integration, but at a less granular level. Basic policy and HSM data movement makes a lot of sense, as does consolidating applications such as document archiving, given specific uses like email or expense report data. Additionally, the end-user driven archiving allowed by Atempo Digital Archive allows your own users to drive a sort of “classification” process, for far more cheaply than a three-letter company will sell it to you.

That’s one type of customer. For the other, applications are consolidating onto archive tiers without the need to have complicated data movement schemes, because the archive tier is all that’s needed. Take a look at Jerome Wendt’s latest piece, “Is It Time to Swap Corporate File Servers for Disk-Based Archiving Systems?”. Over the course of three months the NetApp customer did less I/O than a Permabit Enterprise Archive customer does in one day!

What this means is that an archive tier does not have to be a 2nd tier, where data migrates to after it’s no longer needed. An archive tier can sit right along side your high-performance primary storage, and applications can be placed where they are most cost-effectively hosted. Your production database isn’t going to operate directly on the Enterprise Archive storage, but many of your other applications will. The archive tier will be defined by the long-term reliability and ease of use compared to the primary tier, and above all by the massive cost savings realized by using it instead.

For this second type of customer, deployment of an archive tier of storage is a snap. Just as VTL is easy to drop into an existing tape environment, a NAS-based disk archive is simple to drop into an existing primary environment. It acts as a seamless replacement for expensive primary storage used for hosting file systems, and saves money on day one by eliminating primary storage growth and primary tier backup. I’ll talk about cost more in a future post.

Cross-layer alliances, as Mellor describes, are critical to enabling archive for a large population of customers that want to save money and increase the accessibility of their archive data. We fully support that through our close partnership with Atempo and other archive software partners. For customers that have simpler, per-application needs, however, let’s not forget the lesson from VTL — drop-in solutions are easy to deploy and solve immediate pain. When considering how to save money in your storage environment, you should look immediately at moving applications directly to a solution like Permabit Enterprise Archive.



  1. […] than simple backup, where do you go from there? Permabit’s Jered Floyd thinks it all depends on what kind of enterprise you are. On the one hand, there are those with largely disparate, independent applications that need to sort […]

    Pingback by The Fundamental Nature of Archiving - Data Center Central — August 28, 2008 @ 5:00 pm

  2. […] few months ago, in Two Types of Archives, I commented on a similar article, Archive Layer Cake by Chris Mellor over at Blocks & Files […]

    Pingback by George Crump on Archives « Permabits and Petabytes — November 20, 2008 @ 12:50 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: