Estimating RAID 1 MTBF?

#41 July 17th 04, 07:35 AM

"Robert Wessel" wrote in message
om...
"Ron Reaugh" wrote in message
...
(...)
Remember that this discussion was about two drive RAID 1.
(...)
And what percentage of "bit error" goes undetected overall system wise?
(...)
Two drive modest configuration RAID 1 arrays are the issue.

Sector faults used to occur at about the same order of magnitude as
actual (whole) drive failures (several studies from the early/mid
nineties), although it seems to have gotten rather worse over the last
decade or so. This is somewhat anecdotal, but the actual total sector
error rates per drive have probably gone up a small amount (perhaps a
factor of two or three - which is remarkable given the much larger
increase in the number of sectors on a drive), but hardware failures
rates have gone down by an order of magnitude. So you lose a sector
twenty or thirty times more often than you lose a whole drive.

That information does NOT jibe with what's being reported in the industry.
Most recent HDs are detecting and self flawing sectors before they become
unreadable. When was the last time you saw anykind of failure during a BU,
defrag, copy on say a workstation from a sector becoming unreadable and just
a single sector not associated with an overall drive failure. In 1995 and
before that would happen relatively often but now almost never.

Without scrubbing, the MTTR is very high (since the error is never
detected, and thus never corrected), which seriously negatively
impacts the reliability of the array (at least for the sector in
question, and those in the general vicinity).

Assuming what you say is true and it's NOT.

This is a bit dated, but: "Latent Sector Faults and Reliability of
Disk Arrays," by HANNU H. KARI:

http://www.cs.hut.fi/~hhk/phd/phd.html

Quite dated.

#42 July 17th 04, 07:37 AM

wrote in message
news:1090041101.742428@smirk...

If you run a minor server, with a white-box PC, maybe running Linux
and Apache, or Windows and SQLserver: The risks to this system are so
huge (for example from incompetent administration, bad power supplies)
that a single RAID card or RAID on the motherboard, with a pair of
RAID-1 IDE disks is more than adequate. Even with this pretty crappy
disk setup, the chance that a disk failure takes you out is small,
compared to the other risks (let's not even start on the risks that
SQLserver has a bug and corrupts the data, which is much more likely
than disk failures). In this realm, Ron is right: If there is a read
error during a RAID-1 rebuild, just mark the sector as bad, and pray
that is wasn't an inode, or the index of the database.
Matter-of-fact, if the guys disk is only 70% full, chances are 30%
that the victim is an unallocated sector, and the bad sector will get
remapped on the next write without anyone being any wiser.

Somebody here actually DOES have a clue.

By the way, what do I run at home (I have a minor server with Linux,
Apache, mySQL, but it isn't used for anything that involves money
making): A single (non-RAIDed) 10K RPM Quantum SCSI disk, with nightly
backups to a 200GB cheap IDE disk, and occasional backups to writeable
CD or DLT tape taken offsite (whenever I feel like it).

KinWin makes some real nice ~$30 removeable shock mounted IDE HD trays and
then get a nice little padded case and that big inexpensive [S]ATA HD makes
for a great offsite BU option too. There are no viable tape options in the
small/modest server environment today.

I've been
thinking of getting a cheapo IDE RAID card (I could probably swipe a
used 3Ware 4-port card from the office, we used a few of them in a
test setup and they are now gathering dust), and put 4 reasonable IDE
disks on them (for example the 80GB 7200 RPM Seagates, which can be
had for about $80 on sale).

Use WDCs or Maxtors.

With four disks in a RAID-10

With the 3Ware why not RAID 5?

#43 July 17th 04, 07:37 AM

"Bill Todd" wrote in message
...

wrote in message
news:1090041101.742428@smirk...

...

Malcolm and Bill: You know, Ron is right - within a certain class of
users and applications.

I don't think you've been paying close enough attention. While Ron is
certainly right in stating that there are a great many situations in which
other potential risks far outweigh the risk of loss of RAID-1-protected
data
(even relatively incompetently RAID-1-protected data, as would be the case
in a non-scrubbing array), that issue is not one of those being debated.

Wrong.

Where he went completely off the rails was in suggesting that fatal sector
deterioration is not 'failure', and is thus irrelevant (as scrubbing would
then also be) to the calculation of the MTBF of the RAID-1 pair -
independent of what other external (non-RAID-1-pair-related) risks may or
may not exist in the environment in question.

Wrong. Read what he posted and what I posted rather than making up fairy
tales.

#44 July 17th 04, 08:47 AM

"Ron Reaugh" writes:
That information does NOT jibe with what's being reported in the industry.
Most recent HDs are detecting and self flawing sectors before they become
unreadable. When was the last time you saw anykind of failure during a BU,
defrag, copy on say a workstation from a sector becoming unreadable and just
a single sector not associated with an overall drive failure. In 1995 and
before that would happen relatively often but now almost never.

I've seen it on recent IBM Travelstar laptop drives. I have one with
a number of bad sectors that I took out of service when the problems
appeared, and another which started developing those problems and
shortly afterwards started failing completely.

#45 July 17th 04, 07:56 PM

"Paul Rubin" wrote in message
...
"Ron Reaugh" writes:
That information does NOT jibe with what's being reported in the
industry.
Most recent HDs are detecting and self flawing sectors before they
become
unreadable. When was the last time you saw anykind of failure during a
BU,
defrag, copy on say a workstation from a sector becoming unreadable and
just
a single sector not associated with an overall drive failure. In 1995
and
before that would happen relatively often but now almost never.

I've seen it on recent IBM Travelstar laptop drives. I have one with
a number of bad sectors that I took out of service when the problems
appeared, and another which started developing those problems and
shortly afterwards started failing completely.

Any 3.5" drives do that recently?

#46 July 17th 04, 08:40 PM

ohaya wrote:

Bill Todd wrote:

"ohaya" wrote in message ...

...

Before I begin, I was really looking for just a kind of "ballpark" kind
of "rule of thumb" for now, with as many assumptions/caveats as needed
to make it simple, i.e., something like assume drives are in their
"life" (the flat part of the Weibull/bathtub curve), ignore software,
etc.

The drives *have* to be in their nominal service life: once you go beyond
that, you won't get any meaningful numbers (because they have no
significance to the product, and thus the manufacturer won't have performed
any real testing in that life range).

Think of it like this: I just gave you two SCSI drives, I guarantee you
their MTBF is 1.2 Mhours, which won't vary over the time period that
they'll be in-service, no other hardware will ever fail (i.e., don't
worry about the processor board or raid controller), and it takes ~0
time to repair a failure.

Given something like that, and assuming I RAID1 these two drives, what
kind of MTBF would you expect over time?

Infinite.

- Is it the square of the individual drive MTBF?
See: http://www.phptr.com/articles/article.asp?p=28689

No. This example applies to something like an unmanned spacecraft, where no
repairs or replacements can be made. Such a system has no meaningful MTBF
beyond its nominal service life (which will usually be much less than the
MTBF of even a single component, when that component is something as
reliable as a disk drive).

Or: http://tech-report.com/reviews/2001q...d/index.x?pg=2 (this
one doesn't make sense if MTTR=0 == MTBF=infinity?)

That's how it works, and this is the applicable formula to use. For
completeness, you'd need to factor in the fact that drives have to be
replaced not only when they fail but when they reach the end of their
nominal service life, unless you reserved an extra slot to use to build the
new drive's contents (effectively, temporarily creating a double mirror)
before taking the old drive out.

Or: http://www.teradataforum.com/teradat...107_214543.htm (again,
don't know how MTTR=0 would work)

The same way: though the explanation for RAID-5 MTBF is not in the usual
form, it's equivalent.

- Is it 150% the individual drive MTBF?
See:

http://www.zzyzx.com/products/whitep...ility_primer.p
df

No: the comment you saw there is just some half-assed rule of thumb that
once again assumes no repairs are effected (and is still wrong even under
that assumption, though the later text that explains the value of repair is
qualitatively valid).

- Is it double the individual drive MTBF? (I don't remember where I saw
this one.)

No.

The second paper that you cited has a decent explanation of why the formula
is what it is. If you'd like a more detailed one, check out Transaction
Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.

Hi,

I'm back , and I'm bottom-posting to one of the earlier posts so that
everything is there, as this thread is getting a little long. I hope
that this is ok?

I'm still a little puzzled about your (and I think Ron's) earlier
comments about the article from phptr.com that I linked earlier (see
above), and I've been trying to "reconcile" that approach/methodology to
the ones from the tech-report.com and from the teradataforum.com.

If I try to run the equivalent (hypothetical) numbers through both, I
get vastly different results.

For example, if I:

- assume 100,000 hours for a single drive/device and
- have 3 drives in RAID 1, and
- assume 24 hours MTTR, and
- use the tech-report.com/teradataforum.com method, I get:

MTTF(RAID1) ~ 20 TRILLION hours+

And, if I follow the method from phptr.com, with the same data, I get:

AFR(1 drive) = 8760/100,000 = .0876
AFR(3 drives-RAID1) = (.0876)^3 ~ .0006722
MTBF(3 drives-RAID1) = 8760/AFR(3 drives-RAID1) ~ 13 MILLION hours+

Using the method from the phptr.com page, the MTBF results are WAY less
than the other method.

Assuming that the tech-report.com/teradataforumc.com method is more
correct, and if the method from the phptr.com page is so wrong for
calculating just a relatively simple RAID1 configuration, is ANY of the
rest of the methods described in the phptr.com page a valid approach?

The reason for my question is that the next thing that I wanted to look
at was to use the method described in the rest of the phptr.com page
(i.e., in the case study) to do some ballpark figuring for a more
extended system (with more than just the raided drives), similar to what
was in the case study, using MTBF numbers that I have for components.

If any of you might be able to shed some (more) light on this, I'd
really appreciate it.

Thanks again,
Jim

Hi All,

Interesting thread, and I'm learning a lot.

But, getting back to the original subject matter (

), I ran across a
document on the web that I think may explain what I was puzzled about.
The article was written by Jeffrey S. Pattavina, and it's at:

http://www.commsdesign.com/printable...cleID=18311631

In that article, the author describes several different models and
scenarios for redundant systems, and describes the MTBF calculations for
each, and among these descriptions, you can see how the "150%"
("unmaintained systems") vs. the "square" ("maintained systems")
estimates come from.

Jim

#47 July 17th 04, 09:50 PM

"ohaya" wrote in message ...

....

But, getting back to the original subject matter ( ), I ran across a
document on the web that I think may explain what I was puzzled about.
The article was written by Jeffrey S. Pattavina, and it's at:

http://www.commsdesign.com/printable...cleID=18311631

In that article, the author describes several different models and
scenarios for redundant systems, and describes the MTBF calculations for
each, and among these descriptions, you can see how the "150%"
("unmaintained systems") vs. the "square" ("maintained systems")
estimates come from.

Indeed - and I was too harsh in my comments about the article you referred
to, since on more careful reading it makes it clear that they were talking
about 1) systems that weren't repaired when a device failed and 2) the
theoretical MTBF in the same sense that it applies to individual units
(i.e., systems where it's assumed that you simply discard the system when
its nominal service life expires, at some very small fraction of the MTBF,
and the probability of a failure *within* that service life).

Since you were explicitly asking for information about a system that *would*
be repaired on device failure, that analysis did not apply. But it was not
off-base for the specific situation it described. In fact, the reference I
mentioned discusses such systems (your repetition above jogged my memory),
but only en route to discussing maintained systems (which any RAID typically
is: you don't usually find disks on unmaintained spacecraft, since the
heads won't 'fly' in a vacuum, and in any event improving the MTBF by only
50% - assuming you really aren't going to effect repairs - hardly justifies
the use of a second disk).

- bill

#48 July 17th 04, 10:25 PM

Indeed - and I was too harsh in my comments about the article you referred
to, since on more careful reading it makes it clear that they were talking
about 1) systems that weren't repaired when a device failed and 2) the
theoretical MTBF in the same sense that it applies to individual units
(i.e., systems where it's assumed that you simply discard the system when
its nominal service life expires, at some very small fraction of the MTBF,
and the probability of a failure *within* that service life).

Since you were explicitly asking for information about a system that *would*
be repaired on device failure, that analysis did not apply. But it was not
off-base for the specific situation it described. In fact, the reference I
mentioned discusses such systems (your repetition above jogged my memory),
but only en route to discussing maintained systems (which any RAID typically
is: you don't usually find disks on unmaintained spacecraft, since the
heads won't 'fly' in a vacuum, and in any event improving the MTBF by only
50% - assuming you really aren't going to effect repairs - hardly justifies
the use of a second disk).

Bill,

Thanks.

Now for the "kicker" (and I hope that I don't get flamed for this

)...

I don't remember if this was one of the links that I posted:

http://www.sun.com/blueprints/0602/816-5132-10.pdf

In the above article, the author examines the reliability of several
different configurations, and from looking at the way that he calculates
AFR and MTBF for redundant drives, it looks like it's an approximation
somewhere between a "Hot-standby-maintained" and
"Cold-standby-maintained".

At the beginning of that article, he does a calculation of the AFR for a
redundant pair of drives assuming MTBF of 100,000 hours for the
individual drives. Then, he goes through MTBF calculations for 3
different configurations with drives with MTBF of 1,000,000 hours.

If you take the MTBF of the simple redundant pair (which again, assumed
100,000 hours), I get a MTBF(System) of about 13,031,836 hours.

Now, if you look at the calculations that he does for the 3
configurations in his Case study (again, he used 1,000,000 hours for the
drive MTBF here, rather than 100,000 hours), the best MTBF was for
Architecture 2, 1,752,000 hours.

So, we have:

MTBF(RAID1 pair of 100,000 hour drives) = 13Mhours
MTBF(Architecture 2 w/1,000,000 hour drives) = 1.7Mhours

My question now is:

Given the above MTBF estimates, and that the MTBF of the simple RAID1
pair of drives (even with 100,000 hours MTBF), purely FROM A RELIABILITY
standpoint, why would anyone ever consider a SAN-type storage
architecture over a RAID1 pair of drives (again, this question is purely
from the standpoint of reliability)?

I've done several spreadsheets following the model in the above article,
and it's very difficult (actually, it might be impossible) to get the
MTBF of any of the architectures in the Case study to come even remotely
close to the MTBF of the simple RAID1 pair (even using 100,000 hours for
the simple pair).

If you add the fact that when organizations go to SANs, they oftentime
also have a goal of centralizing all storage for their entire
organization onto the SANs ("eggs in one basket"

), I'm even more
puzzled by this...

The only rationale that I can come up with is that the non-reliability
benefits of going to a centralized SAN-type store must outweigh the loss
of reliability.

Comments??

Thanks again,
Jim

#49 July 18th 04, 05:48 AM

In article ,
Ron Reaugh wrote:
....
With the 3Ware why not RAID 5?

With a file system workload, I hate the small-write penalty (the
read-modify-write cycle necessary for parity update). If I was doing
exclusively or mostly large files, or using a professional-strength
file system (that allocates space in a sensible and RAID-5-friendly
manner), or running a database (excluding TPC-C style small updates),
or if the incremental cost of storage between RAID-1 and RAID-5 were
an issue for me, I would change my mind and be OK with RAID. But in
my case, whether 4 disks give me 2x or 3x the capacity of a single
disk is pretty irrelevant (I'm having a hard time even filling my
current disks), so there is no reason to risk the speed hit that comes
from RAID-5.

Once again, I'm not saying that everyone should pick RAID-1. But
people who don't care about the capacity difference between RAID-1 and
RAID-5 should always pick RAID-1. I bet that in most commercial
environments, this situation doesn't arise often.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us

#50 July 18th 04, 06:07 AM

In article ,
Ron Reaugh wrote:
....
That information does NOT jibe with what's being reported in the industry.
Most recent HDs are detecting and self flawing sectors before they become
unreadable. When was the last time you saw anykind of failure during a BU,
defrag, copy on say a workstation from a sector becoming unreadable and just
a single sector not associated with an overall drive failure. In 1995 and
before that would happen relatively often but now almost never.

True during write: If the sector is found unreadable (unable to sync
or unable to servo), it will be quietly remapped. That's why write
error have become just about non-existant. The only time you get them
is if the drive is out of spare sectors (at which point things are
probably going to hell in a handbasket anyhow, and the drive will
usually completely fail soon thereafter).

Not true during read. You will get errors during read. What is true:
Drives will take sectors that can still be read but are marginal, and
proactively remap them. This does reduce (maybe even greatly reduce)
the rate of read errors. In particular, it gives scrubbing more
traction. But it doesn't always help.

The counter-balancing effect is that disks are getting larger rather
quickly, while the IO bandwidth is increasing more slowly. This means
that statistically, a smaller fraction of the disk is being read, so
there is more opportunity for sectors to rot away unnoticed.

If you have a single disk, and read it just a little bit (on a
single-user system or small server), you are statistically unlikely to
see read errors anyhow. If your disk is 100% busy for years, you will
see read errors. Try this for fun: Get a few hundred disks (I think I
used to have about 150 disks in my previous lab), and run them flat
out for a few months. You will see single-sector read errors. Not
one or two, but many.

Or go to a major disk vendor, and buy a few million disks. They will
probably give you failure statistics that are not available to normal
humans. And if you buy this many disks, you probably have your own QC
department, and you study disk reliability. So you probably know
quite well how often individual sectors fail. But you will not (and
legally can not) release this information to the public.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IDE RAID	Ted Dawson	Asus Motherboards	29	September 21st 04 03:39 AM
Need help with SATA RAID 1 failure on A7N8X Delux	Cameron	Asus Motherboards	10	September 6th 04 11:50 PM
Asus P4C800 Deluxe ATA SATA and RAID Promise FastTrack 378 Drivers and more.	Julian	Asus Motherboards	2	August 11th 04 12:43 PM
What are the advantages of RAID setup?	Rich	General	5	February 23rd 04 08:34 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller	Old Dude	Gigabyte Motherboards	4	November 12th 03 07:26 PM