Estimating RAID 1 MTBF?

#11 July 15th 04, 01:25 PM

Bill Todd wrote:

"ohaya" wrote in message ...

...

BTW, re. the "0" MTTR, see my post back to Bill Todd. I had given 4
hours as an example in that post, but after posting and thinking about
it, given the scenario that I posed, it really seems like the MTTR would
be more like "0" than like 4 hours, since with my scenario, the "system"
never really fails (since the drives are hot-swappable).

Comments?

If you've learned how to repopulate on the order of 100 GB of failed drive
in zero time, especially while not seriously degrading on-going processing
(so don't just assert that you can use anything like the full bandwidth of
its partner to restore it), I suspect that there are many people who would
be very interested in talking with you.

Bill,

You're right, in my mind at least, I was ignoring any effect of
restoring to a replacement drive in the case of a failed drive. But, I
am looking mainly at FAILURE rates (MTBF), and assuming hot-swappable
drives, wouldn't the system continue to run (possibly with some
performance degradation because of the restore)?

Is the period of time where the new/replacement drive is being restored
normally considered "downtime", i.e., is it included in MTTR?

Jim

#12 July 15th 04, 01:30 PM

If you've learned how to repopulate on the order of 100 GB of failed drive
in zero time, especially while not seriously degrading on-going processing
(so don't just assert that you can use anything like the full bandwidth of
its partner to restore it), I suspect that there are many people who would
be very interested in talking with you.

Bill,

You're right, in my mind at least, I was ignoring any effect of
restoring to a replacement drive in the case of a failed drive. But, I
am looking mainly at FAILURE rates (MTBF), and assuming hot-swappable
drives, wouldn't the system continue to run (possibly with some
performance degradation because of the restore)?

Is the period of time where the new/replacement drive is being restored
normally considered "downtime", i.e., is it included in MTTR?

Jim

Hi,

BTW, I wanted to mention that I really appreciate the patience you all
have shown with my questions, some of which might've admittedly appeared
stupid or naive, but this discussion has been VERY helpful to me, at
least. So again, thanks!!

Jim

#13 July 15th 04, 07:33 PM

In article , ohaya wrote:
Is the period of time where the new/replacement drive is being restored
normally considered "downtime", i.e., is it included in MTTR?

Yes. Think of it this way: if a second drive failed in that period, would
the system as a whole fail? Yes. Therefore, that time has to be include in
the calculation, so they must be including it in MTTR.

--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`

#14 July 15th 04, 07:34 PM

"ohaya" wrote in message ...

It's kind of funny, but when I first started looking, I thought that
I'd
find something simple. That was this weekend ...

As I said in my prior post. Maintained RAID 1 failure(of the cases
included) can be ignored as it's swamped by other failures in the real
world. It's a great academic exercise with little practical application
here.

Ron,

Thanks again. I'm starting to understand your 2nd sentence above .

If I'm understanding what you're saying, with a RAID1 setup, with 2
drives with reasonable (i.e., 1.2Mhours) MTBF, from a design standpoint,
you wouldn't be worried about failures of the drives themselves, because
there are other failures/components (e.g., the processor board, etc.)
that would have an MTBF much lower than the raid'ed drives themselves.

Did I get that right?

And many more failure sources, EXACTLY.

BTW, re. the "0" MTTR, see my post back to Bill Todd. I had given 4
hours as an example in that post, but after posting and thinking about
it, given the scenario that I posed, it really seems like the MTTR would
be more like "0" than like 4 hours, since with my scenario, the "system"
never really fails (since the drives are hot-swappable).

Comments?

Except for the possibility that the second drive fails before the first is
replaced. But in that 4 hours I'd be more concerned about gaint meteroid
impact.

#15 July 15th 04, 11:44 PM

In article , ohaya wrote:
....
If the above calculation is in fact a good estimate, and just so that
I'm clear, if:

- I had a RAID1 setup with two SCSI drives that really have an MTBF of
1.2Mhours, and
- The drives are within their "normal" lifetime (i.e., not in infant
mortality or end-of-life), and
- The processor board/hardware was such that it supported a hot swap
such that if one of the drives failed, it could be replaced without
having halting the system, and
- We estimated (for planning purposes) that let's say, worst-case, it
took someone an 4 hours to detect the failure, get another identical
drive, and replace it (so MTTR ~4 hours).

Then a reasonable ballpark estimate for the "theoretical" MTTF (which is
~MTBF) to be:

(1.2Mhours)(1.2Mhours)
---------------------- = MTTF(RAID1)
2 x 4 hours

Is that correct?

Yes. But irrelevant. And non-intuitive to boot.

First, the MTTR (repair time) has to be in there, because: While a
failed drive (1/2 the pair) is being repaired, the array is no longer
redundant. So the only failure mode considered in this formula is the
following: One drive fails; while that drive is being repaired, the
second drive also fails.

By "repair", we mean the time it takes to prepare another drive, and
copy the data from the surviving (good) drive onto it, so redundancy
is restored. By the way, you can immediately see why it is good to
have a hot spare drive ready to go: If you have to wait for a human to
remove the dead drive and add a new drive, the typical MTTR is at
least a few hours, often a day (the time it takes to alert the human
and get him into the room with the spare drive). If the spare is
powered up and ready to go, the typical MTTR is a few hours (can be as
short as 1 hour), to copy the data onyo it.

Obviously, the simply formula (comes from the appendix of the original
Berkeley RAID paper, and already caused much hilarity back then)
ignores all real-world problems, only addressing uncorrelated
single-drive failure.

Second, as many other people have said, this reliability calculation
is completely irrelevant. Real storage systems based on RAID fail,
and they do so all the time. Some fail because of simultaneous
failure of two drives (some slang calls this a "RAID kill"). Some
fail because during reconstruction after a single drive failure, the
surviving drive is found to have bad sectors or be unreadable, or the
extra stress of the reconstruction causes the surviving drive to fail
(slang sometimes calls this a "strip kill" or "repair kill"). Many
more fail due to correlated failures (for example a faulty power
supply manages to kill all the drives simultaneously).

The real source of failures, which is much much higher than the above
academic calculation, is systems issues. Within a disk array,
firmware or hardware faults are commonly the source for data loss
(examples: The array forgot to write dirty data back from cache, or
the SCSI bus has a double-bit error that's not caught by parity
checking, or in a RAID-5 XOR engine, which is sometimes implemented in
hardware, the byte counter can be off). Even more realistic: The best
RAIDed array in the world doesn't help you if your filesystem or
database corrupts data for fun - except that the corrupt data is now
stored extremely reliably.

There is a story of a company that had a complete second computer
center, with all their data being continuously replicated between the
two computer centers. In the event of a desaster, the second computer
center could with a few second notice take over for the first one, and
keep running nearly seamlessly. The second computer center was
located in the other tower of the World Trade Center. Oops.

If you really care about your data surviving for a long time, and
maybe being continuously accessible, and maybe even being continuously
accessible with good performance, you have to look at the overall
design, and have to study techniques such as logging, HSM, backup,
remote mirroring, transactional storage systems, data dispersion a la
Oceanstore ...

In the meantime, get yourself two disks, set them up as RAID-1, and
you have already made the largest single step towards a reliable
system.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us

#16 July 16th 04, 12:46 AM

"ohaya" wrote in message ...

Then a reasonable ballpark estimate for the "theoretical" MTTF (which
is
~MTBF) to be:

(1.2Mhours)(1.2Mhours)
---------------------- = MTTF(RAID1)
2 x 4 hours

Is that correct?

Wow!!!

Somehow, this seems "counter-intuitive" (sorry) ....

Hey, *single* disks are pretty damn reliable in the kind of ideal
service
conditions you postulate: mirrored disks are just (reliable) squared.

A 2,000,000-year RAID-1-pair MTBF sounds great, until you recognize that
if
you have 2,000,000 installations, about one of them will fail each year.
If
each site has 100 disk pairs rather than just one, then someone will
lose
data every 3+ days (or you'll need only 20,000 sites for about one to
lose
data every year).

Bill,

Thanks for the perspective.

But, so that I'm clear, if the individual drives really have 1.2Mhours
MTBF (and I think the Atlas 15K II spec sheet actually claims
1.4Mhours), then the "squared" MTBF would indicate that RAID 1 pair
would be something like 1+ TRILLION hours MTBF, not 1+ MILLION hours.
Have I misinterpreted something?

Yes: the figures I gave above were in years, not hours.

Still, I dropped a 0 while doing the calcs in my head (I think I used 10^5
rather than 10^4 for approximating hours per year): they should all be 10x
as large.

Ralph made a very significant comment, by the way: at such probabilities,
you really have to take silent sector deterioration seriously, so the array
needs to 'scrub' its data in the background to detect such deterioration
while you still have a good copy left to fix it with. Otherwise, the
system's mean time to data loss drops precipitously.

- bill

#17 July 16th 04, 01:00 AM

"Bill Todd" wrote in message
...

"ohaya" wrote in message ...

Then a reasonable ballpark estimate for the "theoretical" MTTF
(which
is
~MTBF) to be:

(1.2Mhours)(1.2Mhours)
---------------------- = MTTF(RAID1)
2 x 4 hours

Is that correct?

Wow!!!

Somehow, this seems "counter-intuitive" (sorry) ....

Hey, *single* disks are pretty damn reliable in the kind of ideal
service
conditions you postulate: mirrored disks are just (reliable) squared.

A 2,000,000-year RAID-1-pair MTBF sounds great, until you recognize
that
if
you have 2,000,000 installations, about one of them will fail each
year.
If
each site has 100 disk pairs rather than just one, then someone will
lose
data every 3+ days (or you'll need only 20,000 sites for about one to
lose
data every year).

Bill,

Thanks for the perspective.

But, so that I'm clear, if the individual drives really have 1.2Mhours
MTBF (and I think the Atlas 15K II spec sheet actually claims
1.4Mhours), then the "squared" MTBF would indicate that RAID 1 pair
would be something like 1+ TRILLION hours MTBF, not 1+ MILLION hours.
Have I misinterpreted something?

Yes: the figures I gave above were in years, not hours.

Still, I dropped a 0 while doing the calcs in my head (I think I used 10^5
rather than 10^4 for approximating hours per year): they should all be
10x
as large.

Ralph made a very significant comment, by the way: at such probabilities,
you really have to take silent sector deterioration seriously, so the
array
needs to 'scrub' its data in the background to detect such deterioration
while you still have a good copy left to fix it with. Otherwise, the
system's mean time to data loss drops precipitously.

A first stab at that process is called nightly backup and the second stab is
scheduled defrags. "silent sector deterioration" can happen but is usually
an isolated sector here or there and is quite uncommon. Good RAID 1 will
fill the new/replacement drive inspite of such a sector read error and then
one is left with an operable system with an isolated read error that may be
dealt with. Depending on the definition of "data loss" this issue may not
count and is relatively obscure. Modern HDs are quite good at being able to
read/recover their data.

#18 July 16th 04, 02:00 AM

"Ron Reaugh" wrote in message
...

"Bill Todd" wrote in message
...

....

Ralph made a very significant comment, by the way: at such
probabilities,
you really have to take silent sector deterioration seriously, so the
array
needs to 'scrub' its data in the background to detect such deterioration
while you still have a good copy left to fix it with. Otherwise, the
system's mean time to data loss drops precipitously.

A first stab at that process is called nightly backup

Nope: this will read only one of the two copies of the data, and thus
decrease the probability that one is bad only by a factor of 2 (unless the
array is wise enough to choose a random copy for each read, or load
considerations encourage it to). Besides, the vast majority of the data
will usually be known to be unchanged and hence won't be backed up at all
frequently.

and the second stab is
scheduled defrags.

Better, but there'll still often be some data that doesn't need to be moved
(at least if the defrag algorithm has any brains).

"silent sector deterioration" can happen but is usually
an isolated sector here or there and is quite uncommon.

It doesn't have to be very common or at all extensive to decrease the mean
time to data loss of a RAID-1 pair from tens of millions of years to tens of
thousands of years. As I noted earlier, when the number of disk pairs gets
high, such a reduction becomes significant.

- bill

#19 July 16th 04, 02:20 AM

"Bill Todd" wrote in message
...

"Ron Reaugh" wrote in message
...

"Bill Todd" wrote in message
...

...

Ralph made a very significant comment, by the way: at such
probabilities,
you really have to take silent sector deterioration seriously, so the
array
needs to 'scrub' its data in the background to detect such
deterioration
while you still have a good copy left to fix it with. Otherwise, the
system's mean time to data loss drops precipitously.

A first stab at that process is called nightly backup

Nope: this will read only one of the two copies of the data,

Well, "stab" and which it will read is not necessarily always clear and may
change.

and thus
decrease the probability that one is bad only by a factor of 2 (unless the
array is wise enough to choose a random copy for each read, or load
considerations encourage it to). Besides, the vast majority of the data
will usually be known to be unchanged and hence won't be backed up at all
frequently.

Assuming incremental backups but two drive RAID 1 may very well get imaged
each night.

and the second stab is
scheduled defrags.

Better, but there'll still often be some data that doesn't need to be
moved
(at least if the defrag algorithm has any brains).

Right but this is all about probability reducttion.

"silent sector deterioration" can happen but is usually
an isolated sector here or there and is quite uncommon.

It doesn't have to be very common or at all extensive to decrease the mean
time to data loss of a RAID-1 pair from tens of millions of years to tens
of
thousands of years. As I noted earlier, when the number of disk pairs
gets
high, such a reduction becomes significant.

Does a bad sector that happens to be detected during a RAID 1 HD failure and
replacement constitute any reflection on the efficacy of that recovery? I
say no.
Does undetected "silent sector deterioration" actually much of a threat to
real world current two drive RAID 1 reliability? I say no.

#20 July 16th 04, 02:45 AM

"Ron Reaugh" wrote in message
...

....

Does a bad sector that happens to be detected during a RAID 1 HD failure
and
replacement constitute any reflection on the efficacy of that recovery? I
say no.

And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.

Does undetected "silent sector deterioration" actually much of a threat to
real world current two drive RAID 1 reliability? I say no.

Same degree of wrongness here as well. You really need to write less and
read more.

- bill

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IDE RAID	Ted Dawson	Asus Motherboards	29	September 21st 04 03:39 AM
Need help with SATA RAID 1 failure on A7N8X Delux	Cameron	Asus Motherboards	10	September 6th 04 11:50 PM
Asus P4C800 Deluxe ATA SATA and RAID Promise FastTrack 378 Drivers and more.	Julian	Asus Motherboards	2	August 11th 04 12:43 PM
What are the advantages of RAID setup?	Rich	General	5	February 23rd 04 08:34 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller	Old Dude	Gigabyte Motherboards	4	November 12th 03 07:26 PM