Estimating RAID 1 MTBF?

#21 July 16th 04, 02:49 AM

"Bill Todd" wrote in message
...

"Ron Reaugh" wrote in message
...

...

Does a bad sector that happens to be detected during a RAID 1 HD failure
and
replacement constitute any reflection on the efficacy of that recovery?
I
say no.

And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.

That's not my definition of failed.

Does undetected "silent sector deterioration" actually much of a threat
to
real world current two drive RAID 1 reliability? I say no.

Same degree of wrongness here as well.

Nope.

#22 July 16th 04, 04:26 AM

Bill Todd wrote:

"ohaya" wrote in message ...

...

Before I begin, I was really looking for just a kind of "ballpark" kind
of "rule of thumb" for now, with as many assumptions/caveats as needed
to make it simple, i.e., something like assume drives are in their
"life" (the flat part of the Weibull/bathtub curve), ignore software,
etc.

The drives *have* to be in their nominal service life: once you go beyond
that, you won't get any meaningful numbers (because they have no
significance to the product, and thus the manufacturer won't have performed
any real testing in that life range).

Think of it like this: I just gave you two SCSI drives, I guarantee you
their MTBF is 1.2 Mhours, which won't vary over the time period that
they'll be in-service, no other hardware will ever fail (i.e., don't
worry about the processor board or raid controller), and it takes ~0
time to repair a failure.

Given something like that, and assuming I RAID1 these two drives, what
kind of MTBF would you expect over time?

Infinite.

- Is it the square of the individual drive MTBF?
See: http://www.phptr.com/articles/article.asp?p=28689

No. This example applies to something like an unmanned spacecraft, where no
repairs or replacements can be made. Such a system has no meaningful MTBF
beyond its nominal service life (which will usually be much less than the
MTBF of even a single component, when that component is something as
reliable as a disk drive).

Or: http://tech-report.com/reviews/2001q...d/index.x?pg=2 (this
one doesn't make sense if MTTR=0 == MTBF=infinity?)

That's how it works, and this is the applicable formula to use. For
completeness, you'd need to factor in the fact that drives have to be
replaced not only when they fail but when they reach the end of their
nominal service life, unless you reserved an extra slot to use to build the
new drive's contents (effectively, temporarily creating a double mirror)
before taking the old drive out.

Or: http://www.teradataforum.com/teradat...107_214543.htm (again,
don't know how MTTR=0 would work)

The same way: though the explanation for RAID-5 MTBF is not in the usual
form, it's equivalent.

- Is it 150% the individual drive MTBF?
See:

http://www.zzyzx.com/products/whitep...ility_primer.p
df

No: the comment you saw there is just some half-assed rule of thumb that
once again assumes no repairs are effected (and is still wrong even under
that assumption, though the later text that explains the value of repair is
qualitatively valid).

- Is it double the individual drive MTBF? (I don't remember where I saw
this one.)

No.

The second paper that you cited has a decent explanation of why the formula
is what it is. If you'd like a more detailed one, check out Transaction
Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.

Hi,

I'm back

, and I'm bottom-posting to one of the earlier posts so that
everything is there, as this thread is getting a little long. I hope
that this is ok?

I'm still a little puzzled about your (and I think Ron's) earlier
comments about the article from phptr.com that I linked earlier (see
above), and I've been trying to "reconcile" that approach/methodology to
the ones from the tech-report.com and from the teradataforum.com.

If I try to run the equivalent (hypothetical) numbers through both, I
get vastly different results.

For example, if I:

- assume 100,000 hours for a single drive/device and
- have 3 drives in RAID 1, and
- assume 24 hours MTTR, and
- use the tech-report.com/teradataforum.com method, I get:

MTTF(RAID1) ~ 20 TRILLION hours+

And, if I follow the method from phptr.com, with the same data, I get:

AFR(1 drive) = 8760/100,000 = .0876
AFR(3 drives-RAID1) = (.0876)^3 ~ .0006722
MTBF(3 drives-RAID1) = 8760/AFR(3 drives-RAID1) ~ 13 MILLION hours+

Using the method from the phptr.com page, the MTBF results are WAY less
than the other method.

Assuming that the tech-report.com/teradataforumc.com method is more
correct, and if the method from the phptr.com page is so wrong for
calculating just a relatively simple RAID1 configuration, is ANY of the
rest of the methods described in the phptr.com page a valid approach?

The reason for my question is that the next thing that I wanted to look
at was to use the method described in the rest of the phptr.com page
(i.e., in the case study) to do some ballpark figuring for a more
extended system (with more than just the raided drives), similar to what
was in the case study, using MTBF numbers that I have for components.

If any of you might be able to shed some (more) light on this, I'd
really appreciate it.

Thanks again,
Jim

#23 July 16th 04, 07:46 AM

"Ron Reaugh" wrote in message ...
And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.

That's not my definition of failed.

I wrote data to the disk. It didn't come back. Sounds like failure to me.

#24 July 16th 04, 07:47 AM

"Ron Reaugh" wrote in message ...
And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.

That's not my definition of failed.

I wrote data to the disk. It didn't come back. Sounds like failure to me.

#25 July 16th 04, 09:05 AM

"Ron Reaugh" wrote in message ...
And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.

That's not my definition of failed.

I wrote data to the disk. It didn't come back. Sounds like failure to me.

#26 July 16th 04, 06:19 PM

In article ,
Ron Reaugh wrote:
....
A first stab at that process is called nightly backup and the second stab is
scheduled defrags. "silent sector deterioration" can happen but is usually
an isolated sector here or there and is quite uncommon.

Yes, good arrays all have scrubbing capabilities (or should have
them). But life isn't quite so easy. Many disk workloads show very
high locality: For long stretches, the actuator stays at or near the
same position. If you start scrubbing carelessly while a
low-intensity foreground workload is running, the response time for
real IOs can increase quite precipitously. So the trick with
implementing scrubbing is to forecast when the foreground workload
will be idle. Like all forecasting of the future, this is quite
difficult (if I knew how to do it, I would play the stock market, and
get out of the storage business).

Note that good scrubbing has to be done internally to the array,
because external scrubbing (for example a full backup, or just reading
the block device end to end) will not touch all sectors on all disks.
And depending on how the array is implemented, it may never touch some
sectors (for example, as long as no disk has failed, most arrays will
never read the parity block on a RAID-5 group). So this isn't
something the user of a disk array can take care of himself.

Good RAID 1 will
fill the new/replacement drive inspite of such a sector read error and then
one is left with an operable system with an isolated read error that may be
dealt with. Depending on the definition of "data loss" this issue may not
count and is relatively obscure. Modern HDs are quite good at being able to
read/recover their data.

Well, the promise of RAIDed disks is that there is NO data loss. I
personally think that as soon as I lose a sector, I have violated my
contract with the end user. Clearly losing one sector is better than
losing a whole LUN or a whole array. But if that sector is in an
allocated area (of the file system or the database that sits above),
the array has corrupted or invalidated data. That's why to many
customers the first bit error invalidates the whole LUN - as soon as
you lose a single sector, you'll have some explaining to do (often
takes the form that a C-level executive has to call the customer and
apologize, followed by massive price cuts or rebates.

If you look at the introduction history and market penetration of the
big disk arrays (EMC Symmetrix, Hitachi Lightning, IBM Shark and so
on), you'll see that the "public perception" of data reliability has
been a big factor in selling and pricing; I don't want to go into
details, as they are sure to step on someones foot. Whether the
"public perception" of data reliability is actually correlated with
the real incidence of data loss is an interesting study in mass
psychology and the power of marketing over engineering. But what is
clear is that there are many customer who are perfectly willing to pay
a lot of extra money (a factor of 2, 3 or 10 more than the lowest
bidder) and select a vendor that gives them a warm and fuzzy feeling
(and maybe also real technical advantages, or even contractual
guarantees) about the quality and reliability of the disk array.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us

#27 July 16th 04, 06:46 PM

In article ,
Ron Reaugh wrote:
Does a bad sector that happens to be detected during a RAID 1 HD failure and
replacement constitute any reflection on the efficacy of that recovery? I
say no.

For an enterprise-class disk array, this is catastrophic (see previous
message). It will cause an alert to field service personnel. Often,
the customer will have to be told officially (even if the customer has
not detected the read failure yet).

For a small RAID array (for example a RAID card on the PCI bus with 2
or 4 drives, and a single-system file system like NTFS or ext3 on
top): Most of the time nobody cares. The performability expectation
for such a system is sufficiently low that loss of sectors can often
be tolerated. In particular because in typical file system workloads
(excluding data bases), much of the data is written, read maybe for a
short period after being written (for example by the next nightly
backup), and never touched again.

Does undetected "silent sector deterioration" actually much of a threat to
real world current two drive RAID 1 reliability? I say no.

Sorry, but for large disk arrays (which typically have many hundred or
a few thousand disks in them) this is right up there at the top
failure modes (excluding the ones that can't be dealt with anyhow,
like meteorite, fire, or software bugs). Together with complete
failure of the 2nd disk, and failure of the 2nd disk that is induced
by the extra stress of RAID recovery.

With the very large disks today, the detected and undetected failure
of individual sectors is beginning to be a very significant worry, and
I can assure you that the large companies in this sector (their name
is typical 2- or 3-letter abbreviations, for example
[IEHS][BMPu][CMn], plus Hitachi and NetApp) are putting significant
research and development effort into new forms of redundant storage
that can survive such problems better.

By the way, I'm always saying "RAID-1" and "2nd disk", even though a
lot of the large arrays are actually formatted to RAID-5 or other
parity or erasure code based schemes. The examples are just easier
for RAID-1.

One particular worrisome trend is "off-track writes", which is rumored
to be more common in consumer-grade disks (typically IDE disks): If
during writing mechanical vibration occurs, the head might wander off,
and write the new data slightly off the track, without completely
overwriting the data on the track. If you now seek away and come back
to read later, you can get lucky and by coincidence settle on the new
data, or you can get unlucky and hit the old track, and read old data
(which is still there, with perfectly valid ECCs, but maybe not for a
whole track and only for a few sectors). You can see how this can be
quite catastrophic, even in a non-redundant system. It gets really
juicy if this happens during a RAID-5 reconstruction, because now you
will take this old data, and XOR it with the other disks in the RAID
group, creating absolute gibberish, and then writing the gibberish
back to disk, thinking that it is valid. In a RAID-1 system the
off-track read at least returns data that used to be valid (small
consolation).

What you might detect here is a certain mindset. We all know that
individual disks are fallible, and we've learned to live with this
(operative word here is "backup"). For small RAID arrays (often based
on motherboards or PCI cards, or hidden in the back end of NAS
servers), we do a few simple steps that give you a huge improvement in
reliability, but are still considered somewhat unreliable. For most
personal and small business users, these small RAID systems give you a
huge bang for the buck. But once you enter the realm of the big
enterprise storage systems, things change, and you MUST NEVER EVER
LOSE DATA (in all upper case), because if you do, high-level
executives will have their busy schedules interrupted, and you
engineer's behind will be on the line or toast. The reason the
enterprise storage systems are so expensive (in terms of $/GB) is that
they are fantastically well built, and vendors go to extraordinary
lengths to stand behind them.

One of these days, if you buy me a few beers, I'll tell you the story
of the big array vendor who offered to truck pallets full of batteries
in every 24 hours to keep his disk array running through a multi-day
power outage (because shutting it down was considered to increase the
risk of data loss).

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us

#28 July 16th 04, 09:44 PM

"Robert Wessel" wrote in message
om...
"Ron Reaugh" wrote in message
...
And you're wrong - utterly. When you have a disk failure in your
RAID-1
pair, and only *then* discover that a data sector on the surviving
disk is
also bad, you've lost data - i.e., 'failed'.

That's not my definition of failed.

I wrote data to the disk. It didn't come back. Sounds like failure to
me.

A single sector lost does not constitute RAID 1 failure. Does RAID 1
operate whereby each read is redundant and then the two read datasets are
compared in OS buffers? NO! There is a failure rate that such would catch
although obscure. Does that constitute a RAID 1 failure? Folks are
grasping into obscurity and very low probabilities.

#29 July 16th 04, 09:44 PM

"Ralph Becker-Szendy" wrote in message
news:1089998384.642616@smirk...
In article ,
Ron Reaugh wrote:
...
A first stab at that process is called nightly backup and the second stab
is
scheduled defrags. "silent sector deterioration" can happen but is
usually
an isolated sector here or there and is quite uncommon.

Yes, good arrays all have scrubbing capabilities (or should have
them). But life isn't quite so easy. Many disk workloads show very
high locality: For long stretches, the actuator stays at or near the
same position. If you start scrubbing carelessly while a
low-intensity foreground workload is running, the response time for
real IOs can increase quite precipitously. So the trick with
implementing scrubbing is to forecast when the foreground workload
will be idle. Like all forecasting of the future, this is quite
difficult (if I knew how to do it, I would play the stock market, and
get out of the storage business).

Note that good scrubbing has to be done internally to the array,
because external scrubbing (for example a full backup, or just reading
the block device end to end) will not touch all sectors on all disks.
And depending on how the array is implemented, it may never touch some
sectors (for example, as long as no disk has failed, most arrays will
never read the parity block on a RAID-5 group). So this isn't
something the user of a disk array can take care of himself.

Good RAID 1 will
fill the new/replacement drive inspite of such a sector read error and
then
one is left with an operable system with an isolated read error that may
be
dealt with. Depending on the definition of "data loss" this issue may
not
count and is relatively obscure. Modern HDs are quite good at being able
to
read/recover their data.

Well, the promise of RAIDed disks is that there is NO data loss.

Well, one has to define that very carefully. Firstly differentiating
"loss" and "error".

I
personally think that as soon as I lose a sector, I have violated my
contract with the end user.

Remember that this discussion was about two drive RAID 1.

Clearly losing one sector is better than
losing a whole LUN or a whole array. But if that sector is in an
allocated area (of the file system or the database that sits above),
the array has corrupted or invalidated data. That's why to many
customers the first bit error invalidates the whole LUN - as soon as
you lose a single sector, you'll have some explaining to do (often
takes the form that a C-level executive has to call the customer and
apologize, followed by massive price cuts or rebates.

And what percentage of "bit error" goes undetected overall system wise?

If you look at the introduction history and market penetration of the
big disk arrays (EMC Symmetrix, Hitachi Lightning, IBM Shark and so
on), you'll see that the "public perception" of data reliability has
been a big factor in selling and pricing; I don't want to go into
details, as they are sure to step on someones foot. Whether the
"public perception" of data reliability is actually correlated with
the real incidence of data loss is an interesting study in mass
psychology and the power of marketing over engineering. But what is
clear is that there are many customer who are perfectly willing to pay
a lot of extra money (a factor of 2, 3 or 10 more than the lowest
bidder) and select a vendor that gives them a warm and fuzzy feeling
(and maybe also real technical advantages, or even contractual
guarantees) about the quality and reliability of the disk array.

Two drive modest configuration RAID 1 arrays are the issue.

#30 July 16th 04, 09:44 PM

wrote in message
news:1089999993.239394@smirk...
In article ,
Ron Reaugh wrote:
Does a bad sector that happens to be detected during a RAID 1 HD failure
and
replacement constitute any reflection on the efficacy of that recovery?
I
say no.

For an enterprise-class disk array, this is catastrophic (see previous
message).

See the thread title and the thread itself and what the issue is.

It will cause an alert to field service personnel. Often,
the customer will have to be told officially (even if the customer has
not detected the read failure yet).

For a small RAID array (for example a RAID card on the PCI bus with 2
or 4 drives, and a single-system file system like NTFS or ext3 on
top): Most of the time nobody cares.

Now we're back to our thread and my point.

The performability expectation
for such a system is sufficiently low that loss of sectors can often
be tolerated. In particular because in typical file system workloads
(excluding data bases), much of the data is written, read maybe for a
short period after being written (for example by the next nightly
backup), and never touched again.

Does undetected "silent sector deterioration" actually much of a threat
to
real world current two drive RAID 1 reliability? I say no.

Sorry,

No, read it again "Does undetected "silent sector deterioration" actually
much of a threat to real world current two drive RAID 1 reliability? I say
no."

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IDE RAID	Ted Dawson	Asus Motherboards	29	September 21st 04 03:39 AM
Need help with SATA RAID 1 failure on A7N8X Delux	Cameron	Asus Motherboards	10	September 6th 04 11:50 PM
Asus P4C800 Deluxe ATA SATA and RAID Promise FastTrack 378 Drivers and more.	Julian	Asus Motherboards	2	August 11th 04 12:43 PM
What are the advantages of RAID setup?	Rich	General	5	February 23rd 04 08:34 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller	Old Dude	Gigabyte Motherboards	4	November 12th 03 07:26 PM