A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Estimating RAID 1 MTBF?



 
 
Thread Tools Display Modes
  #51  
Old July 18th 04, 10:37 AM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"ohaya" wrote in message ...

....

I don't remember if this was one of the links that I posted:

http://www.sun.com/blueprints/0602/816-5132-10.pdf


It is at least very similar to the first one you posted.


In the above article, the author examines the reliability of several
different configurations,


I'll start writing comments as I read the article, rather than organize them
more formally:

Right off the bat, there's the difference between 'reliability' and
'availability' (though the former term is sometimes not as well-defined as
the latter). Reliability, at least in my view, relates to providing
*correct* results (often in a timely manner), whereas availability relates
to continuing to provide results (with no explicit guarantee that they're
correct).

You simply can't achieve that definition of 'reliability' without redundant
hardware operating in lock-step with constant hardware comparisons of
outputs at one or more levels to make sure that both subsystems agree on the
result of every operation (though in the absense of such lock-step hardware
you can at least improve your odds somewhat by writing highly-defensive
hardware and/or software that checks for - and, in the case of something
like ECC codes, may sometimes be able to correct - errors frequently). This
provides the minimum 'fail-stop' environment: on disagreement, the system
is brought to a halt until the problem is resolved (or, if you have 3 rather
than 2 sets of hardware operating in lock-stop, you can take a majority
vote, vote the odd guy off the island, and just continue running while the
bad module is replaced). Without such lock-step redundant hardware, there
are some module faults which you simply won't detect, and you then have to
not only quantify them at the non-redundant module level but also take them
into account as errors which no amount of non-lock-step redundant hardware
will help with (i.e., whatever portion of the overall MTBF those errors
contribute won't be helped at all by introducing additional redundancy).

Technically, Mean Time To Failure (which can often be equated to Mean Time
Between Failures) is a reliability measure - but only if it takes *all*
failures (subtle as well as obvious) into account. This means that attempts
to evaluate the 'reliability' (or aggregate MTTF) of redundant systems will
be accurate only in systems where *all* failures, including subtle ones
(such as sector deterioration, wild or unexecuted writes, etc.), are
detected early enough to cause the module to be taken out of service (and
repaired/replaced if applicable) before they become compounded.

So most RAID environments are not 'reliable', because they assume that any
operation that they complete without detecting an error is good without
checking it (though there are some RAID-3 arrays that reportedly check the
parity block on every read, which at least gets them part-way there, and the
correction codes present in each disk sector reduce the risks of *some*
kinds of errors to an almost vanishingly low value): they simply provide
for continued operation (availability) without performing comprehensive
checks to ensure that no subtle failure has occurred (though some failures
are not subtle, and do get handled). It's not correct to suggest that such
systems have 'no single point of failure': rather, they have no single
point of failure for some subset (and not necessarily a fairly complete
subset, just the subset for which the disks happen to exhibit reasonably
'fail-stop' behavior because of their internal checks or the calamitous
nature of the failure) of anticipated errors.

The article you cite uses 'reliability' not to mean 'availability' but to
mean 'probability that the system will not fail (in some detectable manner,
rather than insidiously) in a given year (at least in its definition as 1 -
8760/MTBF, with MTBF measured in hours). The resulting value has no direct
correspondence to the *aggregate* MTBF figures we've been discussing, nor to
any other commonly-used measu it measures not up-time percentage (which
is what availability is) nor does it take into account repairs, so it really
is only something one might be interested in for unmaintainable systems like
unmanned spacecraft. I believe that I made this observation earlier.

and from looking at the way that he calculates
AFR and MTBF for redundant drives, it looks like it's an approximation
somewhere between a "Hot-standby-maintained" and
"Cold-standby-maintained".

At the beginning of that article, he does a calculation of the AFR for a
redundant pair of drives assuming MTBF of 100,000 hours for the
individual drives. Then, he goes through MTBF calculations for 3
different configurations with drives with MTBF of 1,000,000 hours.

If you take the MTBF of the simple redundant pair (which again, assumed
100,000 hours), I get a MTBF(System) of about 13,031,836 hours.


No, you don't: you get a system MTBF of about 1,141,553 hours. Garbage in,
garbage out: the example you took the aggregate AFR number from to get your
result used *3* disks redundantly.

But that's still a lot higher than the number you get if you follow the
standard logic for a duplexed system without repair, which is only a factor
of 1.5x improvement and thus a system MTBF of 150,000 hours (in both cases,
again making the usually incorrect assumption that *all* failures will be
detected and the failed module taken out of service, one might add).

The logic underlying that standard is that in a system with two redundant
disks the MTBF of the *first* disk to fail is only half the normal value,
because either of the two disks may fail. After that first failure, the
MTBF of the remaining disk is assumed to be 'memoryless', so you get another
full single-disk MTBF out of it, for a total of 1.5x the single-disk MTBF.

I'm inclined to agree that this seems unreasonably conservative (and/or
illogical: it treats durations which are far, far outside the service life
of the elements of the system as if they could simply be added together).
If the probability that a disk will fail over the course of a year is P,
then the probability that both disks will fail over the same period really
does seem as if it ought to be P*P, and back-calculating the result to
generate a system-pair aggregate MTBF seems reasonable.

But it's not, and the easiest way to see this is to choose a different
period to evaluate. Take an hour, for example: the probability that a
100,000-hour MTBF disk will fail in an hour should be 1/100,000, which we'll
call the Hourly Failure Rate (HFR). The probability that both disks will
fail in that hour should then be 1/10,000,000,000. Back-calculate to a
system MTBF and, voila! a 10,000,000,000-hour system MTBF, not the puny
1,000,000+ hours we got when using a year as the evaluation period.
Similarly, if you use a full 5-year period (a typical high-end disk nominal
service life), the calculated system MTBF drops to about 230,000 hours.

Clearly, within the disks' service life the aggregate system MTBF should be
some specific value rather than inversely proportional to the length of the
time-period that you happen to choose to evaluate: there's something subtle
but significant wrong with the logic that the author is using (since he made
the same mistake of believing that system MTBF could be back-calculated from
system AFR).


Now, if you look at the calculations that he does for the 3
configurations in his Case study (again, he used 1,000,000 hours for the
drive MTBF here, rather than 100,000 hours), the best MTBF was for
Architecture 2, 1,752,000 hours.


So, we have:

MTBF(RAID1 pair of 100,000 hour drives) = 13Mhours
MTBF(Architecture 2 w/1,000,000 hour drives) = 1.7Mhours


My question now is:

Given the above MTBF estimates, and that the MTBF of the simple RAID1
pair of drives (even with 100,000 hours MTBF), purely FROM A RELIABILITY
standpoint, why would anyone ever consider a SAN-type storage
architecture over a RAID1 pair of drives (again, this question is purely
from the standpoint of reliability)?


Because a bare pair of drives is useful only as a couple of paperweights,
perhaps?

Disks don't communicate telepathically with their client's main memory:
they have to go through an ATA or SCSI bus (equivalent to the 'link'
mentioned in the SAN discussion, or two if you want to avoid that being a
single point of failure), and an ATA or SCSI controller (equivalent to the
'HBA' in that discussion, or, again, two if you want to avoid that being a
single point of failure), so the only real additional elements in the SAN
are the switches ('concentrators') and additional links behind them, and the
difference in availability (which is really what they're talking about in
this case) is small.

And any such difference is typically overwhelmed by the availability
increases that accrue from centralizing storage management - because
hardware is reliable enough that most problems usually arise from managing
it, especially if that management must be spread out all over the
corporation.



I've done several spreadsheets following the model in the above article,
and it's very difficult (actually, it might be impossible) to get the
MTBF of any of the architectures in the Case study to come even remotely
close to the MTBF of the simple RAID1 pair (even using 100,000 hours for
the simple pair).


Perhaps you should have instead spent more time reading the article
carefully: even leaving aside the logical problems with the author's
method, noticing that the first example was for 3-way mirroring would have
been at least somewhat helpful.



If you add the fact that when organizations go to SANs, they oftentime
also have a goal of centralizing all storage for their entire
organization onto the SANs ("eggs in one basket" ), I'm even more
puzzled by this...


Sometimes concentrating one's eggs is a good thing - see above. It can also
radically reduce the personnel costs of managing storage (which various
studies suggest are several times as expensive as the storage itself) at the
same time it improves the quality of that management.

Or not: like any technology, it can be misapplied by the incompetent.

One final comment on the question of system availability: it's more complex
than the numbers in the article indicate. For example, if you chose not to
double up your links, or your HBAs, or your switches, you'd have a system
that would become unavailable when one of those modules failed, but you'd be
able to replace it and continue in relatively short order. But if you
failed to replicate your storage, when a disk failed you would not just go
down, but you'd lose the data on it and have to recreate it often fairly
painfully - from backup, rolling forward database logs, etc. - before being
able to continue, and even then there might be some recent changes that
would be lost forever.

So storage is not just a matter of availability: the data on it has
intrinsic value beyond immediate accessibility, and losing it can have far
greater consequences than merely losing some piece of hardware which can be
replaced from spares.

- bill



  #52  
Old July 18th 04, 02:35 PM
ohaya
external usenet poster
 
Posts: n/a
Default

Bill,

Comments interspersed below...

Jim



My question now is:

Given the above MTBF estimates, and that the MTBF of the simple RAID1
pair of drives (even with 100,000 hours MTBF), purely FROM A RELIABILITY
standpoint, why would anyone ever consider a SAN-type storage
architecture over a RAID1 pair of drives (again, this question is purely
from the standpoint of reliability)?


Because a bare pair of drives is useful only as a couple of paperweights,
perhaps?

Disks don't communicate telepathically with their client's main memory:
they have to go through an ATA or SCSI bus (equivalent to the 'link'
mentioned in the SAN discussion, or two if you want to avoid that being a
single point of failure), and an ATA or SCSI controller (equivalent to the
'HBA' in that discussion, or, again, two if you want to avoid that being a
single point of failure), so the only real additional elements in the SAN
are the switches ('concentrators') and additional links behind them, and the
difference in availability (which is really what they're talking about in
this case) is small.

And any such difference is typically overwhelmed by the availability
increases that accrue from centralizing storage management - because
hardware is reliable enough that most problems usually arise from managing
it, especially if that management must be spread out all over the
corporation.


You're right that I forgot to include something akin to an HBA (e.g., a
SCSI adapter) in trying to model the simple mirrored drive case. I'll
have to do that.

So, for a mirrored 2-drive pair, instead of:

+---- Drive1 ----+
| |
--+ +---
| |
+---- Drive 2 ---+

I'd be looking at (assuming a single SCSI adapter):

+---- Drive1 ----+
| (D1) |
SCSI--+ +---
Adap | |
(S) +---- Drive 2 ---+
(D2)

So, the AFR(System) ~ S + (D1 * D2)


I'll go ahead and do that.

If I'm understanding the last sentence in the 2nd to the last paragraph
above (where you said "and the difference in availability (which is
really what they're talking about in this case) is small."), you think
that the availability numbers that I would then get for the redundant
pair of drives (the "S+(D1*D2)") would be in the same range as for a
SAN-type configuration.

(Again, for the purposes of this discussion, I'm trying to focus on the
technical aspects, and "divorce" it from any differences in availability
that might accrue from management/operational ease.)

Is that correct?



I've done several spreadsheets following the model in the above article,
and it's very difficult (actually, it might be impossible) to get the
MTBF of any of the architectures in the Case study to come even remotely
close to the MTBF of the simple RAID1 pair (even using 100,000 hours for
the simple pair).


Perhaps you should have instead spent more time reading the article
carefully: even leaving aside the logical problems with the author's
method, noticing that the first example was for 3-way mirroring would have
been at least somewhat helpful.



I'm really sorry about that!!

Believe me, I have read the article carefully, and in calculations that
I'd been doing on my own, I was using a pair of drives (and not 3
drives). But, as I was typing my post, I was just pulling the
information from the article for the post, and I just plain forgot that
he had a 3-drive configuration in the article .

Honestly, this mistake was not intentional on my part, and I hope that
you believe me, because this discussion has been very illuminating for
me, and I hope that we can continue to pursue it a bit more.

Again, thanks...

Jim
  #53  
Old July 18th 04, 11:52 PM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"ohaya" wrote in message ...

....

So, for a mirrored 2-drive pair, instead of:

+---- Drive1 ----+
| |
--+ +---
| |
+---- Drive 2 ---+

I'd be looking at (assuming a single SCSI adapter):

+---- Drive1 ----+
| (D1) |
SCSI--+ +---
Adap | |
(S) +---- Drive 2 ---+
(D2)


Your ASCII art doesn't translate well to a proportional font, but the only
obvious question is why there seems to be something connecting the two
drives together rather than just connecting each to the adapter, In the
case of the article's first two architectures, such a link is shown but is
not included in the 'reliability calculations' (such as they are): while it
may reflect an FC loop which theoretically might provide an additional
measure of connection redundancy if exploited, they don't take it into
account (and doing the same thing with SCSI would be more of an adventure).

If you use a single SCSI adapter, that single point of failure will tend to
dominate the rest of the subsystem's availability. For most purposes that's
entirely reasonable (e.g., because the client system isn't any more
available than the single adapter it's using, so doubling up adapters won't
help *overall* system availability noticeably), but its *subsystem* MTBF
value won't compare favorably with a subsystem that uses paired components
throughout. For that matter, a single typical client isn't much more
available than a single disk, either - but in that case, what's important is
not so much guarding against the temporary loss of service while the disk is
replaced with a new one but guarding against loss of the *data* on that disk
(as explained earlier).

A typical SAN arrangement would probably use single adapters in most clients
and single links from each client to the switches (again, because most
clients other than higher-end servers wouldn't be significantly more
available than the single adapter plus link would be). But it would
probably use redundancy in the switch ports (on separate switches) and links
to the RAID-1 storage in any cases where that storage was shared among
multiple clients (because if *any* client is running, it should be able to
see its storage, hence the shared back-end connections must be more
available than the individual client links).


So, the AFR(System) ~ S + (D1 * D2)


Haven't you yet realized that the calculations in the article are completely
bogus for the purposes you're attempting to use them for (determining
overall system MTBF, *especially* in the presence of reasonable repair
activity which they don't even pretend to take into account)? Their entire
approach is wrong: use the mechanisms described in the other article that
do include MTTR.

- bill



  #54  
Old July 19th 04, 03:34 AM
ohaya
external usenet poster
 
Posts: n/a
Default


So, the AFR(System) ~ S + (D1 * D2)


Haven't you yet realized that the calculations in the article are completely
bogus for the purposes you're attempting to use them for (determining
overall system MTBF, *especially* in the presence of reasonable repair
activity which they don't even pretend to take into account)? Their entire
approach is wrong: use the mechanisms described in the other article that
do include MTTR.



Bill,

Sorry if I'm being obtuse, and please don't think that I'm trying to be
argumentative. I can understand why you would get impatient with me,
but I don't understand why you think the calculations in Daoud's article
are "bogus".

I'm assuming that by the "other article", you're referring to the one by
Jeffrey Pattavina at:

http://www.commsdesign.com/printable...cleID=18311631

If that is the article you were referring to, let me try to explain my
confusion...

I reviewed the formulas that Pattavina came up with for calculating the
MTBF for redundant system in the hot-standby-with-repair and
cold-standby-with-repair, and for hot-standby-with-repair, it looks like
Pattavina got:


MTBF(redundant pair) = (MTBF(drive))^2
===============
2 x MTTR

And for the cold-standy-with-repair, he got:


MTBF(redundant pair) = (MTBF(drive))^2
===============
MTTR



If we assume that MTTR is say, 1 hour, then Pattavina's
hot-standby-with-repair formula becomes:

MTBF(redundant pair) = (MTBF(drive))^2
===============
2

in other words, the half of the MTBF(drive) squared.


For the cold-standby-with-repair formula, if MTTR is 1 hour, then it
becomes:

MTBF(redundant pair) = (MTBF(drive))^2
===============
MTTR

or, in other words, the MTBF(drive) squared.


In both the hot-standby-with-repair and cold-standby-with-repair cases,
if MTTR is 1 hour, the formulas that Pattavina came up with say that the
MTBF of the redundant pair is on the order of the square of the
MTBF(drive).


Now, in Daoud's article/paper, although he defined "AFR" as "8760/MTBF",
isn't he doing essentially the same thing as Pattavina when he (Daoud)
says that for redundant devices, you multiply the AFRs of the individual
devices to get the "System AFR", and then converting back to MTBF??


What part of of his calculations do you think are bogus, and why?

In his paper, at the bottom of pg. 5, Daoud does say:

"Note – This is a very intuitive method to determine the reliability
of a system. However, for more complex systems, computer modeling is
used to study the reliability.".

I guess that the way that I've been interpreting that note has been
something like (my words): "This is a quick-and-dirty way to calculate
'ballpark' numbers, but it may not be precise.".


Is that what you're referring to when you say "bogus", i.e., that the
way that he (Daoud) calculates things may be "off" either by a factor of
"2" or because he's assuming some value for MTTR (e.g., "1 hour") that
he hasn't stated outright?


Like I said above, I'm NOT trying to be argumentative, but I may be
confused !!!

I don't know either of the authors (Daoud or Pattavina), but it looks
like they're both coming up with kind of similar calculations/formulas.

Jim
  #55  
Old July 19th 04, 06:37 AM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"ohaya" wrote in message ...

So, the AFR(System) ~ S + (D1 * D2)


Haven't you yet realized that the calculations in the article are

completely
bogus for the purposes you're attempting to use them for (determining
overall system MTBF, *especially* in the presence of reasonable repair
activity which they don't even pretend to take into account)? Their

entire
approach is wrong: use the mechanisms described in the other article

that
do include MTTR.



Bill,

Sorry if I'm being obtuse, and please don't think that I'm trying to be
argumentative. I can understand why you would get impatient with me,
but I don't understand why you think the calculations in Daoud's article
are "bogus".


What part of my demonstration that his analysis yielded three dramatically
different values for system MTBF depending upon whether you considered the
'hourly failure rate', the 'yearly failure rate', or the 'lifetime failure
rate' (5 years) in the intermediate step did you find difficult to
understand?

- bill



  #56  
Old July 19th 04, 10:37 AM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"ohaya" wrote in message ...

So, the AFR(System) ~ S + (D1 * D2)


Haven't you yet realized that the calculations in the article are

completely
bogus for the purposes you're attempting to use them for (determining
overall system MTBF, *especially* in the presence of reasonable repair
activity which they don't even pretend to take into account)? Their

entire
approach is wrong: use the mechanisms described in the other article

that
do include MTTR.



Bill,

Sorry if I'm being obtuse, and please don't think that I'm trying to be
argumentative. I can understand why you would get impatient with me,
but I don't understand why you think the calculations in Daoud's article
are "bogus".


OK, I was perhaps a *bit* short in my previous answer: while reductio ad
absurdum is an entirely legitimate form of proof, it can leave one a bit
unsatisfied if the conceptual flaw in the disproven argument has not been
singled out.


I'm assuming that by the "other article", you're referring to the one by
Jeffrey Pattavina at:

http://www.commsdesign.com/printable...cleID=18311631

If that is the article you were referring to,


No, but it reaches appropriate conclusions, so that's fine.

let me try to explain my
confusion...

I reviewed the formulas that Pattavina came up with for calculating the
MTBF for redundant system in the hot-standby-with-repair and
cold-standby-with-repair, and for hot-standby-with-repair, it looks like
Pattavina got:


MTBF(redundant pair) = (MTBF(drive))^2
===============
2 x MTTR

And for the cold-standy-with-repair, he got:


MTBF(redundant pair) = (MTBF(drive))^2
===============
MTTR



If we assume that MTTR is say, 1 hour, then Pattavina's
hot-standby-with-repair formula becomes:

MTBF(redundant pair) = (MTBF(drive))^2
===============
2

in other words, the half of the MTBF(drive) squared.


For the cold-standby-with-repair formula, if MTTR is 1 hour, then it
becomes:

MTBF(redundant pair) = (MTBF(drive))^2
===============
MTTR

or, in other words, the MTBF(drive) squared.


In both the hot-standby-with-repair and cold-standby-with-repair cases,
if MTTR is 1 hour, the formulas that Pattavina came up with say that the
MTBF of the redundant pair is on the order of the square of the
MTBF(drive).


Since there's nothing magical about a 1-hour MTTR time, it is more
illuminating to observe that in both cases the system MTBF is *proportional*
to the square of the unit MTBF, though various MTTR values may make it of an
entirely different order.



Now, in Daoud's article/paper, although he defined "AFR" as "8760/MTBF",
isn't he doing essentially the same thing as Pattavina when he (Daoud)
says that for redundant devices, you multiply the AFRs of the individual
devices to get the "System AFR", and then converting back to MTBF??


No. If you take the concept of repair out of Pattavina's discussion, the
situation changes entirely. And Pattavina was kind enough to cover that
exact case even before he gets into the repair cases.

Guess what? He concludes that in a hot-standby system without repair, the
presence of the redundant device increases the overall MTBF by 50%, not by
anything resembling the square of its value.

You really do need to learn to read more carefully: you exhibit the
characteristics of an inquisitive but impatient and sloppy student who
managed to take away exactly the wrong habits from a speed-reading course.



What part of of his calculations do you think are bogus, and why?


That's the area where I still have some sympathy, because while the fact
that his method generates wildly inconsistent results depending upon the
size of the time-slice you use in the intermediate step leaves no doubt that
it's incorrect, it's not intuitively obvious precisely *what* is wrong with
it: you ought to be able to multiply small probabilities of independent
events to arrive at a combined probability for their occurrence, and it's
not obvious why you then cannot back-calculate a system MTBF from the
result.

But studying the details of the repair case at least helps shed some light.
The formula for the 2-disk case clearly does not apply to an infinite MTTR
(what one first might use to try to reduce the repair case to a no-repair
case), since it yields a 0 MTBF for the system. But that formula is derived
under the assumption that MTTR MTBF; when you remove that assumption, the
new result is that the second disk does not improve MTBF at all.

That's also clearly incorrect (the second disk *has* to help *some*), but at
least relatively close to the modest factor of 1.5 that's accepted (and
allegedly proved, though not in as intuitively-satisfying a manner as the
incorrect Daoud result). An MTTR equal to 1/2 the MTBF would crank out the
result we want, but it's not immediately obvious intuitively why that would
be the right value to use (though it might conceivably be related to the
fact that if the first device fails on average half-way through the
time-slice in question, then we'll need an effective MTTR equal to half the
slice to ensure that it won't be repaired before the slice runs out).

In any event, repair clearly matters significantly, and thus the no-repair
case over any lengthy period clearly must generate a dramatically lower
system MTBF than any case where repair is effected quickly.

I've been kind of hoping that someone else would pop up with a more lucid
explanation of Daoud's problem, since each time I've tried to I've been
unable to stay awake enough to do so. Feel free to consider it a research
project: just don't continue to mess around with an approach that we *know*
is wrong.

- bill



  #57  
Old July 19th 04, 09:33 PM
Dorothy Bradbury
external usenet poster
 
Posts: n/a
Default

Hope someone is including the MTBF of the RAID card, and the
actual RAID format itself (on the card, on the disk, or a floppy disk),
& whether the replacement RAID card is a perfect match re BIOS/bugs.

I recall a UK ISP who marvelled at their high availability RAID system
for a super-duper NNTP server - until they had to rebuild the RAID. It
seems no-one had 1) calculated how long based even on max SDTR,
and 2) the impact on the server performance during that rebuild.

Availability is a function of a chain of components/risks.
--
Dorothy Bradbury
www.stores.ebay.co.uk/panaflofan for quiet Panaflo fans & other items
http://homepage.ntlworld.com/dorothy...ry/panaflo.htm (Direct)


  #58  
Old July 20th 04, 01:01 AM
Peter da Silva
external usenet poster
 
Posts: n/a
Default

In article ,
Ron Reaugh wrote:
The question is whether the unlikely but theoretical possible loss(there are
other theoretically possible losses which seem to be easily ignored) of a
sector in a two drive RAID 1 configuration is necessarily catastrophic.


No, that's not the question. The question is whether you're going to be
allowed to redefine "failure" as "catastrophic failure" when everyone else
is quite happy to use the word "failure" to include non-catastrophic
failures.

--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`
  #59  
Old July 25th 04, 07:15 PM
Jesper Monsted
external usenet poster
 
Posts: n/a
Default

"Dorothy Bradbury" wrote in news:5eWKc.653
:
I recall a UK ISP who marvelled at their high availability RAID system
for a super-duper NNTP server - until they had to rebuild the RAID. It
seems no-one had 1) calculated how long based even on max SDTR,
and 2) the impact on the server performance during that rebuild.


I recall a major .dk ISP who used Adaptec 5400 RAID controllers and, even
with help from adaptec, couldn't get the damn things to rebuild once a
drive failed. Then again, at least it didn't go as bad as the other 5400
that just shot it's raid set and refused to talk to it again...

--
/Jesper Monsted
  #60  
Old July 25th 04, 07:18 PM
Jesper Monsted
external usenet poster
 
Posts: n/a
Default

"Bill Todd" wrote in
:
That's still really good, but not so far beyond something you'd start
worrying about to be utterly ridiculous - at least if you're a
manufacturer (individual customers still have almost no chance of
seeing a failure, but even a single one that does is still very bad
publicity).


HDS claims full responsibility for that and will pay you if that happens
(at least in theory). Has anyone here ever heard of that happening?


--
/Jesper Monsted
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
IDE RAID Ted Dawson Asus Motherboards 29 September 21st 04 03:39 AM
Need help with SATA RAID 1 failure on A7N8X Delux Cameron Asus Motherboards 10 September 6th 04 11:50 PM
Asus P4C800 Deluxe ATA SATA and RAID Promise FastTrack 378 Drivers and more. Julian Asus Motherboards 2 August 11th 04 12:43 PM
What are the advantages of RAID setup? Rich General 5 February 23rd 04 08:34 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller Old Dude Gigabyte Motherboards 4 November 12th 03 07:26 PM


All times are GMT +1. The time now is 07:36 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.