Permabits and Petabytes

August 20, 2008

Are Fibre Channel and SCSI Drives More Reliable?

Filed under: Jered Floyd — jeredfloyd @ 9:04 pm

One of the adages of the storage industry has been “Fibre Channel and SCSI drives are more reliable than SATA and PATA drives”. This has always confused me. The technology in the spindles just doesn’t change that much, and in the past the difference between the SCSI and ATA models of a drive may have been as little as different drive electronics on the same spindle.

How could SCSI drives have been more reliable? Could it have something to do with them costing three times as much for the same amount of storage? Hmmm…

It used to be easy to find comparable drives in SATA and SCSI flavors, but that’s become increasingly difficult with the advent of 10K and 15K RPM drives. The drive manufacturers have created a false segmentation in the market, where 10K and 15K RPM drives are only available in SCSI, FC and SAS flavors, and almost never in SATA. Western Digital was the lone company that broke the rules of this cabal, but they seem to have been shamed back to offering only a single model, the VelociRaptor. Let’s hope the FTC decides to start looking into this.

So, today your performance requirements may lock you into paying extortionist prices for SAS drives. But, where we can make comparisons, the reliability canard doesn’t seem to make sense. I looked at the data sheets for all of Seagate, Hitachi, and Western Digital’s current drives, and they all have MTBFs quoted in the 1 million to 1.4 million hour range. Two recent studies have shown that there’s no observable difference in failure rate of SATA vs. SCSI/FC drives, although both are far less reliable than the manufacturers’ quoted figures. There’s no enhanced spindle reliability with the pricey drives.

Where there is a difference between models, though, is with our old friend, the bit error rate. The bit error rate is the rate at which a block just can’t be read from the disk, due to not being able to recover data from the PRML and ECC codes on the platter. The whole drive doesn’t fail, but you can’t read that block. In a RAID system, this triggers reconstruction of that block from the remaining drives.

As I describe in the video “The Trouble with RAID”, this bit error rate is the biggest problem with RAID technology today. In the event of a drive failure in a RAID 4 or RAID 5 set, every remaining drive must be read perfectly from start to finish or else data will be lost. With a 7+1 set of terabyte drives, this means 7 TB must be read. A bit error rate of 1 in 10^14 means that there’s a 44% chance that can’t be done.

With Seagate, every SCSI/FC drive has a bit error rate of 1 in 10^16, except for the Cheetah 10K.7, which is at 10^15 but maxes out at 300 GB. With Hitachi, every SCSI/FC drive has a bit error rate of 1 in 10^16. With Seagate, Hitachi and Western Digital, every SATA drive has a bit error rate of 1 in 10^15 or even 10^14, even with terabyte capacity drives! Not one exception.

This means that if (and only if) you’re using RAID, SCSI drives are hands-down more reliable. SCSI drives have a better bit error rate, and are smaller in capacity (so less data needs to be read for a RAID rebuild). In the worst case, the Cheetah 10K.7 drives, there’s only a 1.7% chance that a block will be lost in reconstruction. In the SATA worst case, the Hitachi Deskstar 7K1000.B or the Seagate DiamondMax 22, there’s a massive 44% chance the RAID rebuild will fail.

This just goes to re-emphasize that in large enterprise archives, RAID technology just doesn’t cut it anymore. That’s why Permabit developed RAIN-EC and our other on-disk data protection technologies; only these can provide the level of reliability required for large-scale, long-term data storage.

The other big mystery, though, is why these vendors aren’t shipping SATA drives with better bit error rates. Unlike spindle reliability, bit error rate is largely a software tweak — just use a few more bits of the disk for ECC purposes. Take that terabyte drive down to 750 GB of capacity and make it ten times more reliable. Drive vendors could do this tomorrow.

The false segmentation of drive interface and spindle speed is alive and well today, with Western Digital appearing to have largely given up. Is there a false segmentation on bit error rate too?

16 Comments »

  1. […] a guy who discusses SATA vs. SCSI disk reliability. Short conclusion: actual disk failures (MTFB) are almost exactly as likely wtih cheap SATA disks […]

    Pingback by The Navarra Group » Blog Archive » 2008-09-08: SATA vs. SCSI reliability — September 8, 2008 @ 12:21 pm

  2. “Unlike spindle reliability, bit error rate is largely a software tweak — just use a few more bits of the disk for ECC purposes. Take that terabyte drive down to 750 GB of capacity and make it ten times more reliable. Drive vendors could do this tomorrow.”

    Unfortunately it’s not that simple. A big part of why enterprise drives have lower BER rates is that they have protection along the entire data path from the disk to the host. That means extra ECC checks built into the controller hardware/firmware and ECC memory. This adds quite a bit to the cost of production, which is why SATA drives generally have no protection at all once the data leaves the disk.

    Comment by Steven — September 8, 2008 @ 10:16 pm

  3. Something else I forgot to mention is that enterprise drives are tested _much_ more thoroughly than SATA drives. I can tell you for a fact that the major manufacturers have shipped SATA drives with things like firmware bugs that can cause data corruption only to find out latter from their customers. That kind of thing just doesn’t fly in the enterprise market so companies spend a _lot_ more time and money testing their enterprise products than their SATA ones.

    Comment by Steven — September 8, 2008 @ 10:32 pm

  4. Hi,

    I think your calculations are not very precise. For example, Seagate declares bit error rates like “1 sector per 10e15” (quotation from inexpensive Barracuda ES.2 SATA drives datasheet). This gives for your example with 7*1Tb drives the probability to loose one sector of ~0.0001%.

    Comment by Lev Shamardin — September 9, 2008 @ 2:48 am

  5. I’d guess that the market for consumer-grade drives is driven by price/capacity and gross capacity, rather than reliability. Most home/small-office buyers, presented with the choice of 1TB or 750GB for the same price, are going to get the 1TB. Maybe if the drive manufacturers advertised error rates (in some sort of simplified, bigger-is-better units) on the front of the boxes, that could change.

    Comment by Western Infidels — September 9, 2008 @ 11:28 am

  6. Unfortunately it’s not that simple. A big part of why enterprise drives have lower BER rates is that they have protection along the entire data path from the disk to the host.

    Steven,

    You’re mixing “enterprise” and “SATA” here, but I get the point you’re making, but I don’t think it’s correct. Serial-ATA has CRC checks on bus protocol frames. With both SATA and SAS you have protection of data from the drive to the host controller.

    I’ll agree that SAS controllers may be more likely to use ECC memory, or that SAS drives may maintain better ECC data internally to protect the data until it’s converted to a SAS protocol frame, but these are again manufacturing decisions on the part of the vendors involved. SATA can be just as reliable as SAS in these areas.

    Of course, both are vulnerable to data corruption between the application and the host controller. This is why application vendors like Oracle add additional ECC data at the application layer — they know they can’t fully trust any component firther down the chain.

    –Jered

    Comment by jeredfloyd — September 9, 2008 @ 12:07 pm

  7. That kind of thing just doesn’t fly in the enterprise market so companies spend a _lot_ more time and money testing their enterprise products than their SATA ones.

    These same vendors offer “enterprise” SATA drives which, presumably, go through this more intensive testing. They go for around a 20% price premium over the “consumer” SATA drives. Why not just stick with one interface, then? I stand by my claim that enterprise : SAS :: consumer : SATA is a false market segmentation.

    Comment by jeredfloyd — September 9, 2008 @ 12:09 pm

  8. I think your calculations are not very precise. For example, Seagate declares bit error rates like “1 sector per 10e15″ (quotation from inexpensive Barracuda ES.2 SATA drives datasheet).

    Lev,

    Yeah, the Seagate data sheets use different terminology — they call the error rate out as “1 sector” instead of “1 bit”, which would be 512 times more reliable than any other vendor! I think they might just be referring to the fact that a single bit error causes the read of the entire sector to fail as you don’t get partial data, you get a read error on the sector.

    Clarification from Seagate would be great here. If they really are 512X more reliable than every other drive vendor, you’d think they’d be shouting that from the treetops!

    –Jered

    Comment by jeredfloyd — September 9, 2008 @ 12:12 pm

  9. These reliability figures all come from the vendors, right? Is it possible that the actual reliability is the same, but the datasheets exaggerate the BER of the consumer drives for marketing reasons?

    I can see two reasons to do that: (a) promising high reliability makes for higher costs in warranty replacement, so it carries a premium; (b) the vendors don’t want enterprises buying consumer drives, so they make weaker statements about the consumer drives.

    This wouldn’t be illegal, or even particularly deceitful; they’re promising that the BER is no *greater* than such-and-so, and so exaggerating the BER is still true.

    Comment by John Stracke — September 9, 2008 @ 2:52 pm

  10. Is it possible that the actual reliability is the same, but the datasheets exaggerate the BER of the consumer drives for marketing reasons?

    That’s an interesting theory. I’m doubtful, as I’ve never seen a storage vendor knowingly undersell a product on the spec sheet, but it’s possible. There was a recent study using data from NetApp that suggested that the published BER rates were pretty accurate — I think it may have been “An Analysis of Data Corruption in the Storage Stack” (http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf).

    Comment by jeredfloyd — September 9, 2008 @ 6:00 pm

  11. […] a guy who discusses SATA vs. SCSI disk reliability. Short conclusion: actual disk failures (MTBF) are almost exactly as likely wtih cheap SATA disks […]

    Pingback by EconTech » 56% Chance Your Hard Drive Is Not (Fully) Readable, A Lawsuit in the Making — September 29, 2008 @ 4:51 pm

  12. “With a 7+1 set of terabyte drives, this means 7 TB must be read. A bit error rate of 1 in 10^14 means that there’s a 56% chance that can’t be done.”

    I think that should be 44%.

    Let’s work it out together.

    1 in every 10^14 bits has an error. Let’s refer to 1/10^14 as alpha for the
    sake of ease of notation.

    The probabilty that a bit has no error is thus 1-alpha.

    The probability that there will be no error while reading 7TB (as disk
    manufacturers count terabytes) of data is

    (1-alpha)^(10^(1024*1000*1000*1000*8*7)), or, more practically,

    10^(log(1024*1000*1000*1000*8*7*log(1-alpha))).

    calc from the wonderful apcalc package can work that our for us:

    % calc ’10^(7*1000*1000*1000*1024*8*log(1-(1/10^14)))’
    0.56358337345776143653

    There you go, 56%.

    The chance that it cannot be done is then 44%.

    Comment by Andras — September 29, 2008 @ 7:29 pm

  13. Andras,

    You’re correct. I was working from E, the expected number of bit errors — with 7 TB, the expected number of bit errors is 0.56. This is obviously not the same as probability of at least one failure, which is 0.44 as you say. I’ve updated the post to correct this error. 44% is still pretty scary!

    Regards,
    –Jered

    Comment by jeredfloyd — October 12, 2008 @ 5:25 pm

  14. […] a guy who discusses SATA vs. SCSI disk reliability. Short conclusion: actual disk failures (MTBF) are almost exactly as likely wtih cheap SATA disks […]

    Pingback by The Navarra Group » Blog Archive » SATA vs. SCSI reliability — October 24, 2008 @ 2:52 pm

  15. What about JBOD arrays?

    This raid issue makes me want to invest in IDE raid heh.

    Comment by JamieIvanov — March 15, 2009 @ 9:04 am

  16. As an administrator in a little development shop I manage more than a dozen RAIDs consisting of between 8 and 24 disks with all kinds of sizes between 37 GB velociraptors and 1 tb seagates, samsungs and hitachis.

    I used to think that the statistics works in mysterious ways causing an aggregate failure rate upon us which is the higher the more disk drives one has. But I came to the conclusion that it is actually a severe exaggeration of mtbf numbers which can just as well account for the number of disks I had to replace.

    That said I find something profoundly misleading in the mathematics in the article and in the previous comments. It is assumed that the ber/uer aggregates 100%, so given a number of drives you get pretty bad odds against you if you have to read let’s say 14 disks in order to rebuild a 15-way raid6 bucket. I’ve actually rebuilt so many arrays that I should be scared stiff now especially because the UER never happened. I am certainly increasing the probability that something like an UER may happen any time soon by watching the S.M.A.R.T. reports and preventively replacing a suspect drive before it stops the show, but let’s be serious – one drive has to read close to all of its 1E15 bits before it gets an UER not 15 drives together. The likelihood rises definitely but it is most certainly not the linear function of the number of drives.

    We need a sounder probability computation not just panic mongering. If there is a good aspect to this article it is the implied warning that RAID is no replacement for regular frequent exhaustive backups. That and the useful hint to RAIN-EC, whatever it is.

    Comment by dkrnic — November 16, 2009 @ 2:52 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to Andras Cancel reply

Create a free website or blog at WordPress.com.