If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
On several occasions I have seen situations where faulty UPS's caused
servers wtih RAID 5 arrays to reboot continuosly which caused corruption to either the RAID array itself or the file system. I am considering recommending RAID1 whenever possible because I suspect that it would be more resillient under the same conditions because I have two seperate copies of the system and I do not suspect that mirroring would mirror NTFS corruption or suffer from the problems of RAID 5 array corruption. I would like to hear your opinions on this. thanks |
#2
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
|
#3
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
|
#4
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
In article ,
wrote: On several occasions I have seen situations where faulty UPS's caused servers wtih RAID 5 arrays to reboot continuosly which caused corruption to either the RAID array itself or the file system. OK, let's analyze this. Did the continuous reboots cause: A. The disk array to suffer so many errors (for example disk errors on the actual spinning platters or hardware errors, for example in the memory cache in your disk array) that it can not correct for them, because it is designed to handle only one error at a time. B. The disk array to corrupt data on disk. C. The attached host and its filesystem to become "confused", and write incorrect data to the disk array, which the disk array correctly stored, but which now causes corruption? From your description, we can not distinguish exactly what you have observed. Let's analyze those three scenarios in reverse order: C. There is nothing the disk array can do if the host is broken and writes incorrect data to it. If you have a broken host, or broken file system running on the host, it doesn't matter whether your recording media is the worlds most reliable disk array, or some junk from the surplus store. Fix your host. B. If your disk array is so badly built that it corrupts data on disk (meaning deliberately writing wrong data to disk, or losing the ability to correct disk errors), then it is a piece of crap, and you need to either replace it with a quality product, or have the vendor fix it. HOWEVER, it is true that RAID1 is simpler to implement than RAID5, in particular if you do not require stable and serialized reads after a failure (meaning that after a failure, the returned data is still data that was previously written, but not necessarily the data that was most recently written, nor necessarily always the same data). If you happen to have a really crappy disk array with sort-of broken firmware, running it in RAID1 is likely to stress it much less, and you may be able to live with flaws in the RAID implementation in that scenario. A. If power cycling causes hardware errors, you need a better quality disk array. To some extent, spinning disks will always be vulnerable to errors, but a well-built power distribution system in the disk array should to some extent protect the disks from power cycling causing errors. BUT: Disks will always fail, and power cycling will increase the rate of disk failure. And it turns out that RAID5 is actually less resilient against disk failure than RAID10 (note that I did not write RAID1 here). Here's why. For a concrete example, imagine that you have 10 disks, each 1TB in size (I picked round numbers, not because those are completely realistic, but to make the math easier). If you configure those 10 disks as a 9+P parity RAID5 array, you will get 9TB of useable capacity, but ANY failure of 2 disks will cause data loss, and the probability that two sector or track errors on separate disks collaborate to cause data loss is pretty high. If on the other hand you configure those 10 disks as a RAID10 array (mirrored and striped), then 4 out of 5 times you can actually survive a double disk fault, as long as the two failed disks are not "next" to each other (meaning part of the mirror pair for the same stripe). Similarly, the probability of two sector or track errors causing data loss is also about 5 times lower than for a similar RAID5 setup. The price you pay is that the useable capacity is only 5TB. So it is indeed true that RAID1 (in the guise of the real-world RAID10 implementation) is statistically more tolerant of disk errors than RAID5, even though its worst case is the same. BUT: For a well-designed commercial-use disk array, with sufficient spares, good disk state monitoring, and good power distribution and batteries for clean shutdown, the difference described above should be infinitesimally small. If your disk array, workload, and reliability expectations are ssuch such that you need the handle double disk faults, don't run RAID10 because it can handle it "most" of the time, but get a dual-fault tolerant array. I am considering recommending RAID1 whenever possible because I suspect that it would be more resillient under the same conditions because I have two seperate copies of the system and I do not suspect that mirroring would mirror NTFS corruption or suffer from the problems of RAID 5 array corruption. I would like to hear your opinions on this. I, for one, would not agree with that statement, until we can figure out what is really causing your problems. But as mentioned above, using RAID1 will make things easier in several respects, and might be a workable band-aid to reduce the incidence of such problems to a tolerable level. You might also mask a much graver problem, so when it eventually comes back to bit you, it hurts even more. Can you tell us: What type of disk array, what type of host, what type of connection, what type of workload? What are all the details of the corruption? Did the disk array management software (you are running management software, right?) report data errors? Can you look in the log files of your host whether disk errors were logged? What have the vendors of your host, OS and disk array contributed to solving the problem (you have full support contracts, right?). Good luck! If you really care about your data, chisel it on stone tablets. Make two copies. Remember what happened to Moses when he dropped commandements 11 through 15. -- Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca_dot_us 735 Sunset Ridge Road; Los Gatos, CA 95033 |
#5
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
In article ,
wrote: mirroring would mirror NTFS corruption or suffer from the problems of RAID 5 array corruption. I would like to hear your opinions on this. Well, unless you start with very big discs you're still going to need to involve RAID 5 even if you mirror. That being said I've definitely seen instances where file system corruption on disc A was happily mirrored to disc B before it was discovered. Maybe put the disk array on a ups thats on the ups? This is why Ghu, the great, has given us LTO drives. |
#6
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
Dan Rumney wrote:
There's nothing inherent in RAID-5 that makes it susceptible to corruption due to continuous rebooting of the controller. Equally, there's more disk writes per write from the host. If it's a controller with volatile cache memory, and you're always losing power this could be a problem. If you write/change 1 bit on a RAID5 volume, the controller has to read in 64kB (just a typical value), recalculate parity across the drives, then write it all back. Plain mirroring might still be 64kB stripes, but only across two disks, not 3 or more. there's nothing inherent in RAID-1 that makes it more robust in these scenarios. There's also less drives to possible have lost writes to with a mirror than with RAID5. A crappy RAID1 controller may not even notice that the data across both disks doesn't match though. If the corruption is being caused by the controller it doesn't matter if you have mirrored copies of your data; the controller will just write the corruption to one or both copies. Also, if the corruption is truly at the NTFS level, then you should be looking at your filesystem and not the storage controller. True, but if a controller is writing garbage to disk, NTFS will notice. |
#8
|
|||
|
|||
RAID 5 corruption, RAID 1 more stable?
Cydrome Leader wrote:
Dan Rumney wrote: There's nothing inherent in RAID-5 that makes it susceptible to corruption due to continuous rebooting of the controller. Equally, there's more disk writes per write from the host. If it's a controller with volatile cache memory, and you're always losing power this could be a problem. If you are using a raid controller or array with no safe memory, that is the first mistake. Presuming you care enough about the data to spend for the 1/n overhead and write cycle overhead of raid. If you write/change 1 bit on a RAID5 volume, the controller has to read in 64kB (just a typical value), recalculate parity across the drives, then write it all back. 64k for a modestly low end raid controller. Plain mirroring might still be 64kB stripes, but only across two disks, not 3 or more. Mirror used to be the norm in the midlevel unix boxen. Saw way more propagations of error data than recovery from error unless there was some additional software/hardware in between that could reasonably unambiguously spot the good copy in reasonable scenarios. More often it was used as a quick way to take a snapshot or clone a file system by making and breaking mirrors. there's nothing inherent in RAID-1 that makes it more robust in these scenarios. There's also less drives to possible have lost writes to with a mirror than with RAID5. A crappy RAID1 controller may not even notice that the data across both disks doesn't match though. Yup. If the corruption is being caused by the controller it doesn't matter if you have mirrored copies of your data; the controller will just write the corruption to one or both copies. Also, if the corruption is truly at the NTFS level, then you should be looking at your filesystem and not the storage controller. True, but if a controller is writing garbage to disk, NTFS will notice. Not until it reads it. :-) |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to isolate raid corruption | gg | Asus Motherboards | 9 | July 29th 07 06:24 AM |
SATA Raid 1 Data Corruption - A7N8X / RocketRaid 1520 | ice | Asus Motherboards | 7 | December 17th 05 08:39 AM |
OC'ing RAID Corruption... | Vigor | Overclocking AMD Processors | 10 | December 9th 05 11:35 AM |
How stable is A8N-E raid and nic | Don | Asus Motherboards | 0 | September 24th 05 08:12 AM |
P4C800-E Deluxe and S-ATA RAID... Intel RAID or Promise RAID ??? | Fraizer | Asus Motherboards | 3 | October 29th 03 01:50 PM |