In the middle of an article earlier this week on problems with MozyPro restore performance there was buried an interesting nugget:
A wildfire in July caused Santa Barbara to be hit with several power outages, which led to the failure last week of one of three drives in a Teddy Bear server’s RAID group. Before a replacement drive could be installed, another drive in the group failed, and the foundation’s data was lost.
This is far more interesting story than a backup to the cloud service failing to perform, as if anyone should be surprised there given the recent string of cloud-storage growing pains, because it’s a publicly documented double drive failure. Why do I find that interesting?
One of the big benefits of Permabit’s RAIN-EC technology is that it allows us to protect our customers against multiple, simultaneous drive failures. But not only can we recover from two failed drives (or nodes), but we can also recover from uncorrectable block read errors during that reconstruction process — something that RAID 6 can’t do. Jerome Wendt wrote about this earlier in the year and created a bit of an uproar among RAID vendors, but what he said is absolutely right.
As our RAIN-EC white paper explains, RAID 6 wasn’t really introduced to protect against two simultaneous drive failures; the expectation is (rightly) that this should be a rare occurrence. Instead, RAID 6 addresses the bit error rate of drives — the rate at which a block just can’t be read. RAID 4 and 5 were reaching the point that after a drive failure you were certain to encounter an unreadable block during rebuild, and data would be lost. Even NetApp says so.
Our white paper goes into more detail on the specifics, but the problem is that drive and array capacities have been growing faster than the statistical bit error rate of the drives. A drive with a bit error rate of 1 in 10^14 sounds really good, but that’s actually only 12.5 TB. With 1 TB drives on the market, it becomes pretty likely to hit an unreadable block during rebuild — the one time you have to be able to read every remaining block perfectly.
RAID 6 fixes this problem. After a drive failure, the system rebuilds. If it encounters an unreadable block, it recovers from the secondary parity disk. If you get a double drive failure though, well, you’re out of luck on recovering from those unreadable blocks.
Permabit’s Enterprise Archive has erasure code (similar to parity) information to recover from multiple drive failures. But, on top of that we also maintain additional recovery information on each disk. This additional recovery information consumes less than 0.1% of the disk, but it allows us to recover from up to 4 KB of unreadable data on a drive — without having to fail that drive, and without having to recover from other data sources in the system.
This means that in the event of two drives failing, you’re protected. With RAID 6, you’re not.
Now back to the anecdote that I started with. When I bring up the scenario of RAID 6 failing to recover from double drive failures, I often hear (usually from other vendors), “But, that’s never going to happen!” Even though those same vendors sell RAID 6 in marketing literature as designed to recover from a double failure, they’re right that it’s statistically unlikely given published drive MTBFs that are upwards of 1 million hours.
The problem is, those MTBFs are just estimates. The RAID rebuild stresses drives, and induces failures more frequently than statistically likely. As disks age, failures are more likely. Vibrational coupling of drives within a single RAID chassis can also lead to simultaneous failures.
Double drive failures do happen in the real world, much more frequently than the numbers would suggest. When that happens, Permabit Enterprise Archive with RAIN-EC protects you in ways that RAID 6 fundamentally can’t.
[…] there is a difference between models, though, is with our old friend, the bit error rate. The bit error rate is the rate at which a block just can’t be read from the disk, due to not […]
Pingback by Are Fibre Channel and SCSI Drives More Reliable? « Permabits and Petabytes — August 20, 2008 @ 9:04 pm
[…] business information. How reliable is your archive storage system over the long term? Are you using older RAID technologies that might not hold up with modern high capacity […]
Pingback by Dirty Little Secrets About Dirty Little Secrets « Permabits and Petabytes — October 12, 2008 @ 5:08 pm
[…] actually very hard to make use of larger drives in conventional storage systems. I’ve already beat the RAID rebuild drum to death, but there’s a significant challenge just for normal I/O. While drive capacities […]
Pingback by The Greening of Storage « Permabits and Petabytes — November 20, 2008 @ 5:29 pm
[…] I’ve written about before, all data protection schemes depend on statistical failure models. With hard drives, the model allows for two primary types of failure: total spindle failure (i.e. […]
Pingback by Data Protection’s Black Swan: Seagate Drive Failures « Permabits and Petabytes — January 21, 2009 @ 9:46 am
[…] I explain in an earlier post, the bit error rate of the drives can be catastrophic for RAID. In a RAID 4 or 5 rebuild it is […]
Pingback by End of the RAID « Permabits and Petabytes — February 10, 2009 @ 10:39 am