Raid level write verification

#1 March 23rd 05, 04:24 AM

Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly that
both the data write and the ecc write match while levels like raid 5
don't (with few exceptions on the very high end)? Is this dependant
rather on make & model?

TIA

#2 March 23rd 05, 04:53 AM

teckytim wrote:
Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly that
both the data write and the ecc write match while levels like raid 5
don't (with few exceptions on the very high end)?

No. No RAID level *requires* any kind of read-after-write verification,
though any RAID *implementation* could offer it as an additional feature.

However, I think some RAID-3 implementations verify on the fly that the
parity information matches the stripe being *read* (since that has no
impact on performance, save for the CPU cycles required by the
comparison and the bus cycles consumed by reading the parity). Though I
don't recall that the accepted RAID-3 definition *requires* this.

- bill

#3 March 24th 05, 04:01 AM

Bill Todd wrote:
teckytim wrote:
Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly
that
both the data write and the ecc write match while levels like raid
5
don't (with few exceptions on the very high end)?

No. No RAID level *requires* any kind of read-after-write
verification,
though any RAID *implementation* could offer it as an additional
feature.

However, I think some RAID-3 implementations verify on the fly that
the
parity information matches the stripe being *read* (since that has no

impact on performance, save for the CPU cycles required by the
comparison and the bus cycles consumed by reading the parity).
Though I
don't recall that the accepted RAID-3 definition *requires* this.

- bill

Thanks. I was afraid that was the answer as so many raid details are
nonstandard or rather manufacturer specific.

So if I *require* a raid implementation that does this, where do I have
to look? Are there PCI card products (SATA & SCSI), raid boxes, or is
this only available in the high end non-das raid like emc, etc.

Are background media scans sufficient protection against failing/flaky
media so the verify feature discussed above is not necessary?

Thanks again.

#4 March 24th 05, 05:29 AM

teckytim wrote:

....

So if I *require* a raid implementation that does this, where do I have
to look? Are there PCI card products (SATA & SCSI), raid boxes, or is
this only available in the high end non-das raid like emc, etc.

Someone else here might know, but I don't.

Are background media scans sufficient protection against failing/flaky
media so the verify feature discussed above is not necessary?

Thanks again.

I don't think the two are all that closely related. All
read-after-write does is verify that the data written was what you
intended to write: while this does guard against very low-probability
errors like silently-failing null writes or 'wild' writes (though with
the latter you have to worry about what got clobbered as well), it isn't
likely to be any kind of substitute for background 'scrubbing' to catch
deteriorating sectors (which I think are orders of magnitude more likely
than unheralded write failures, but that's just my impression).

Sun claims that its new ZFS file system for Solaris has supplementary
checksum information that guards data from main-memory to disk and back
again - you might find a look there interesting. But that's not
specifically RAID-related.

- bill

#5 March 24th 05, 06:14 AM

Bill Todd wrote:
teckytim wrote:

...

So if I *require* a raid implementation that does this, where do I
have
to look? Are there PCI card products (SATA & SCSI), raid boxes, or
is
this only available in the high end non-das raid like emc, etc.

Someone else here might know, but I don't.

Are background media scans sufficient protection against
failing/flaky
media so the verify feature discussed above is not necessary?

Thanks again.

I don't think the two are all that closely related. All
read-after-write does is verify that the data written was what you
intended to write: while this does guard against very
low-probability
errors like silently-failing null writes or 'wild' writes (though
with
the latter you have to worry about what got clobbered as well), it
isn't
likely to be any kind of substitute for background 'scrubbing' to
catch
deteriorating sectors (which I think are orders of magnitude more
likely
than unheralded write failures, but that's just my impression).

I didn't think they are related, at least not outside of the most
general sense. Read-after-write just seems to me a reasonable extra
failsafe where data integrity/security trumps all else. That
perception could be wrong though.

I have occasionally read about transient write errors in raid 5
implementations which writers/poster believe make raid 5 less reliable
than other levels. Also I have read about some interesting data
protection features in EMC & I think Netapp which I believe combat
these fears.

It seems to me the likelihood of a flakey drive causing problems
increases with array size (drive #) and esp in larger ATA arrays. In
the event of, say, a weakening sector which causes a write to fail but
is not quite weak enough to be marked bad it would cause confusion on
defect scan. I have also seen a drive or two which was failing by
corrupting data, but still spinning & not showing much or anything in
the way of bad sectors. It's rare, but I've seen it and wouldn't want
one such drive to take a crap all over an arrays stripes.
"read-after-write" in addition to background defect scanning makes
sense to me. I usually see only the latter. That makes me wonder.

Sun claims that its new ZFS file system for Solaris has supplementary

checksum information that guards data from main-memory to disk and
back
again - you might find a look there interesting. But that's not
specifically RAID-related.

- bill

Very interesting. Will look. Thanks again for the response.

#6 March 25th 05, 08:37 PM

teckytim ) wrote:
: I have occasionally read about transient write errors in raid 5
: implementations which writers/poster believe make raid 5 less reliable
: than other levels. Also I have read about some interesting data
: protection features in EMC & I think Netapp which I believe combat
: these fears.

A block protection scheme (aka DIF) has recently been standardized by T10.
That protection scheme has been implemented by a few of the silicon
suppliers (including my employer). Look for that scheme to become a
pretty common feature in the next couple of years. It has been a
proprietary feature of a few storage vendors for a number of years already.
The recently announced SGI 4G FC array (OEMed from Engenio) is an example
that has this new standardized feature built into it.

Dave

#7 March 25th 05, 09:01 PM

Dave Sheehy wrote:
teckytim ) wrote:
: I have occasionally read about transient write errors in raid 5
: implementations which writers/poster believe make raid 5 less reliable
: than other levels. Also I have read about some interesting data
: protection features in EMC & I think Netapp which I believe combat
: these fears.

A block protection scheme (aka DIF) has recently been standardized by T10.
That protection scheme has been implemented by a few of the silicon
suppliers (including my employer). Look for that scheme to become a
pretty common feature in the next couple of years. It has been a
proprietary feature of a few storage vendors for a number of years already.
The recently announced SGI 4G FC array (OEMed from Engenio) is an example
that has this new standardized feature built into it.

If it is indeed now a standard I suspect that given sufficient effort I
could learn its details. But if you found it convenient to post them
(at least to the degree that one could understand the technology
involved - e.g., is it simply an additional checksum, does it live with
the data or separate from it, etc.), it would save me and other curious
individuals some time.

Thanks,

- bill

#8 March 25th 05, 10:03 PM

Bill Todd ) wrote:
: Dave Sheehy wrote:
: teckytim ) wrote:
: : I have occasionally read about transient write errors in raid 5
: : implementations which writers/poster believe make raid 5 less reliable
: : than other levels. Also I have read about some interesting data
: : protection features in EMC & I think Netapp which I believe combat
: : these fears.
:
: A block protection scheme (aka DIF) has recently been standardized by T10.
: That protection scheme has been implemented by a few of the silicon
: suppliers (including my employer). Look for that scheme to become a
: pretty common feature in the next couple of years. It has been a
: proprietary feature of a few storage vendors for a number of years already.
: The recently announced SGI 4G FC array (OEMed from Engenio) is an example
: that has this new standardized feature built into it.

: If it is indeed now a standard I suspect that given sufficient effort I
: could learn its details. But if you found it convenient to post them
: (at least to the degree that one could understand the technology
: involved - e.g., is it simply an additional checksum, does it live with
: the data or separate from it, etc.), it would save me and other curious
: individuals some time.

The details can be found in the SBC-2 or -3 standard at t10.org. Look for
the section on "Protection Information". Also, some new 32 bit extended SCSI
commands are being proposed to support this functionality. There are some
rumblings about adding this to T13 as well but I'm not familiar with the
status of that.

Briefly, 8 bytes of information are appended to each block. There are 3
fields of information, a 2 byte CRC (of the data), a 4 byte LBA count, and
a 2 byte application tag. Theoretically, the information can be applied end
to end (i.e. generated at the server and sent to and returned from the
array) but that is not a typical deployment (although a few HBA manufacturers
are incorporating the feature). The typical deployment is to generate the
information in the protocol controller on the front end of the array as
its written to memory (i.e. data cache). It is written to disk by the back
end. The information is validated by both back end and front end when the
data is read by the protocol controller. When performend in this fashion the
data is protected as it traverses the bus (e.g. PCI and PCIX only have
simple parity protection), while it resides in memory, and while it resides
on the disk.

Dave

#9 March 25th 05, 10:47 PM

Dave Sheehy wrote:

....

Briefly, 8 bytes of information are appended to each block. There are 3
fields of information, a 2 byte CRC (of the data), a 4 byte LBA count, and
a 2 byte application tag. Theoretically, the information can be applied end
to end (i.e. generated at the server and sent to and returned from the
array) but that is not a typical deployment (although a few HBA manufacturers
are incorporating the feature). The typical deployment is to generate the
information in the protocol controller on the front end of the array as
its written to memory (i.e. data cache). It is written to disk by the back
end. The information is validated by both back end and front end when the
data is read by the protocol controller. When performend in this fashion the
data is protected as it traverses the bus (e.g. PCI and PCIX only have
simple parity protection), while it resides in memory, and while it resides
on the disk.

Thanks. That's the kind of thing I thought might be useful a decade
ago, though seems a little stingy today - e.g., limiting the LBA address
to 32 bits (common arrays below the level that the host system may be
aware of already exceed this size, though when used only as a sanity
check the low 32 bits of the LBA may be sufficient) and the
application-specific area to 16 (if both fields were longer the
application-specific area could be used, e.g., to hold a file identifier
which would facilitate reconstruction of a file system - I have a vague
recollection that IBM's i-series boxes and their ancestors may have done
this).

It should at least allow a host which cares enough to implement the
functionality the ability to generate the validation information before
the data leaves main memory and check it after it returns. This will
catch otherwise undetected bus errors and anything clobbered by a wild
write, but unfortunately still won't detect that the intended
destination was never updated (or that a silent null write failure
occurred).

And the largest single potential market for such a feature could turn
out to be SATA based...

- bill

#10 March 26th 05, 05:42 AM

Thanks for the follow-up posts Bill & Dave. Very helpful. In addition
I see a proposal for a "Write Read Verify" feature extension over
at T13.org
http://www.t13.org/docs2005/e04129r5...ead_verify.pdf

I am specifically looking for SCSI & SATA DAS & controllers that
utilizes advanced protection mechanisms as has been mentioned here.
Any product recommendations along those lines?

Thanks again for your time.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Need help with SATA RAID 1 failure on A7N8X Delux	Cameron	Asus Motherboards	10	September 6th 04 11:50 PM
P4C800-E Delux: Setting up SATA Drives with RAID	Will	Asus Motherboards	13	July 12th 04 04:33 AM
How to set up RAID 0+1 on P4C800E-DLX MB -using 4 SATA HDD's & 2 ATA133 HHD?	Data Wing	Asus Motherboards	2	June 5th 04 03:47 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller	Old Dude	Gigabyte Motherboards	4	November 12th 03 07:26 PM
which RAID level for write only?	Tester A.	Storage & Hardrives	11	September 29th 03 10:12 AM