If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Jesper Monsted wrote:
"Nik Simpson" wrote in : Ok... "stripe size"... So in a RAID array of 5 disks with a stripe size of 8k, if I submit a request to the RAID controller to write 5,000 bytes, these bytes will not be scattered equally across all drives? Since the size of the data being written is less than the stripe size, all of the data could conceivably written to one disk? You nailed it. Stripe size is the minimum size of a write to a physical disk in the array. Trying allocate evenly at the byte level to each disk would be insane in terms of the effect on performance. Unless you're using RAID3 where the stripe size is basically one bit IIRC, the original RAID definition for RAID3 is striping at the byte level, not the bit level, perhaps you are thinking of RAID2. -- Nik Simpson |
#12
|
|||
|
|||
As already noted, most RAID implementations do not work this way:
instead, data is spread across the disks in the array in coarser chunks - usually no smaller than 4 KB per disk, often 64 KB per disk, and there are good reasons in most workloads to make them even larger. Some early implementations of RAID-3 distributed the data at finer grain (much as you describe above), but I've never heard of RAID-0, -1, -4, or -5 doing so. Bill / Nik / Robert, Thanks, guys. This is really great information. I need to keep an eye on the total number of IOs/second that my SQL Server is generating. I've learned from this thread that if I have a reasonably small number of disks in my RAID, I can estimate my maximum number of IOPs by multiplying the IOPs rating of an individual disk by the number of disks in the array. I've also learned from our discussion of "stripe size" that if SQL Server decides to read some data whose size is LESS THAN the stripe size, it very well may read this data from a single disk as opposed to reading all of the disks in parallel. Fair enough... but let's say SQL Server needs to read 65k of data [perhaps it's doing a table scan], and let's say the stripe size is 64k, and let's say I have a RAID0 array (just to keep the example simple) with 10 disks. What happens in this scenario? Here's what I'm thinking: - SQL Server sends a single IO request to the HBA. Windows registers this in its Performance Monitor as one IO request. - The HBA realizes that the first 64k of data that it needs to read is on Disk#0 and the final 1k is on Disk#1. It generates two IO requests, one for each disk and submits them in parallel. So the bottom line is: Each IO request of 64k or less will generate 1 IO. Each read-request that is asking for *more* than 64k will very likely generate MORE than one IO request since more than one disk needs to be touched. So when trying to estimate the number of IOs/sec that my application will require, I need to consider the number of reads/writes that will exceed the stripe size since these operations, which Windows perceives as single IOs, are in reality generating multiple IOs. Is this correct? David |
#13
|
|||
|
|||
David Sworder wrote:
As already noted, most RAID implementations do not work this way: instead, data is spread across the disks in the array in coarser chunks - usually no smaller than 4 KB per disk, often 64 KB per disk, and there are good reasons in most workloads to make them even larger. Some early implementations of RAID-3 distributed the data at finer grain (much as you describe above), but I've never heard of RAID-0, -1, -4, or -5 doing so. Bill / Nik / Robert, Thanks, guys. This is really great information. I need to keep an eye on the total number of IOs/second that my SQL Server is generating. I've learned from this thread that if I have a reasonably small number of disks in my RAID, I can estimate my maximum number of IOPs by multiplying the IOPs rating of an individual disk by the number of disks in the array. I've also learned from our discussion of "stripe size" that if SQL Server decides to read some data whose size is LESS THAN the stripe size, it very well may read this data from a single disk as opposed to reading all of the disks in parallel. Fair enough... but let's say SQL Server needs to read 65k of data [perhaps it's doing a table scan], and let's say the stripe size is 64k, and let's say I have a RAID0 array (just to keep the example simple) with 10 disks. What happens in this scenario? Here's what I'm thinking: - SQL Server sends a single IO request to the HBA. Windows registers this in its Performance Monitor as one IO request. - The HBA realizes that the first 64k of data that it needs to read is on Disk#0 and the final 1k is on Disk#1. It generates two IO requests, one for each disk and submits them in parallel. So the bottom line is: Each IO request of 64k or less will generate 1 IO. Each read-request that is asking for *more* than 64k will very likely generate MORE than one IO request since more than one disk needs to be touched. So when trying to estimate the number of IOs/sec that my application will require, I need to consider the number of reads/writes that will exceed the stripe size since these operations, which Windows perceives as single IOs, are in reality generating multiple IOs. Is this correct? David Yup, you pretty much got it. -- Nik Simpson |
#14
|
|||
|
|||
"David Sworder" wrote in message ... .... So the bottom line is: Each IO request of 64k or less will generate 1 IO. Only if the request happens to be perfectly aligned with the stripe layout. Otherwise, the best you can guarantee is the each (read) request equal to or smaller than the per-disk 'chunk' size will require no more than 2 (parallel) disk accesses. That's one reason that larger chunk sizes are desirable: they minimize the likelihood that a single request will span multiple disks (at least in situations where request sizes vary: if you can control the environment such that request sizes are always appropriately aligned and never exceed your chunk size, then there's no reason to increase that chunk size). Accessing data from multiple disks in parallel does improve response time for large requests, but even if that's all you're interested in there's little reason to use a size any less than 64 KB - 128 KB; if large-request latency is not a very important aspect of your workload, then chunk sizes in the multi-megabyte range may be appropriate, since they'll minimize multiple-disk seeks over the widest range of request sizes and hence maximize throughput if the workload has a good deal of parallelism in it. Each read-request that is asking for *more* than 64k will very likely generate MORE than one IO request since more than one disk needs to be touched. Unless part of the request hits in cache, it *will* generate more than a single request if it exceeds the chunk size - no 'very likely' about it. So when trying to estimate the number of IOs/sec that my application will require, I need to consider the number of reads/writes that will exceed the stripe size since these operations, which Windows perceives as single IOs, are in reality generating multiple IOs. Is this correct? And whatever multiple disk I/Os are generated by write activity. Large writes (that span an entire array stripe) are relatively more efficient: a full-stripe write doesn't have to perform any reads at all, it just plunks down the data on the n-1 data disks in the stripe and calculates parity directly from that data for the parity chunk. Reasonably smart arrays perform intermediate optimizations, such that the number of disk accesses is minimized (e.g., if you're writing to all but one data disk in the stripe, it's cheaper to read the remaining unwritten chunk than it would be to read all the chunks on the disks that you're modifying). - bill |
#15
|
|||
|
|||
"Nik Simpson" wrote in message ...
IIRC, the original RAID definition for RAID3 is striping at the byte level, not the bit level, perhaps you are thinking of RAID2. Both RAID2 and RAID3 are (effectively) striped at the bit level - the smallest addressable unit ("sector" or "block") of the array is split across all the drives, thus reading (or writing) that unit requires hitting all those drives (hopefully in parallel). What's different is how the error correction works. In RAID2 an EC scheme is used on a bit-by-bit basis, in RAID3 you've got a block parity scheme just like in RAID4/5. I've never actually seen a RAID2 implementation, but it's possible someone has one somewhere. The point of RAID2/3 is to improve *sequential* I/O performance. Random I/O performance is that of a single drive, but sequential performance is improved proportionally to the number of data disks in the array. Typically RAID3 arrays have the spindles synchronized for best performance. Mostly used by the HPC folks. |
#16
|
|||
|
|||
"Nik Simpson" wrote in message . ..
David Sworder wrote: Here comes the term "stripe size". This is the number of consequtive bytes allocated on the same disc. Depending on your performance requirement you will chose a small or large stripe size. (8k-64k or even much larger) Ha! Just when I think I'm beginning to get a handle on things, a new term/concept comes along that reveals just how ignorant I really was (am). Ok... "stripe size"... So in a RAID array of 5 disks with a stripe size of 8k, if I submit a request to the RAID controller to write 5,000 bytes, these bytes will not be scattered equally across all drives? Since the size of the data being written is less than the stripe size, all of the data could conceivably written to one disk? You nailed it. Stripe size is the minimum size of a write to a physical disk in the array. Trying allocate evenly at the byte level to each disk would be insane in terms of the effect on performance. That's not correct. The minimum size of a write on all RAID5 arrays remains a single sector. However any write smaller than a complete stripe requires a read-modify-write cycle for the appropriate data and parity blocks. Often you get to define the stripe size indirectly by specifying a per-disk block size (often that's the 64KB number that's tossed around). The strip size for RAID4/5 is then (n-1) times the block size (for the five drive array being discussed, that would result in a 256KB stripe, a six drive array would have a 320KB stripe). Vendor usage of the terms block and stripe are often more than a bit confusing. Ideally, you'd like all your (random) reads to fit within a single block (which would allow the read to be satisfied by hitting only a single disk), and all your writes to cover an entire stripe (which would allow the stripe to be written without the read-modify-write cycle). Obviously those two goals conflict. For most database workloads, reads dominate, and the writes that do occur tend to be tiny in relation to practical stripe sizes (so you almost never get a full stripe write). So most database applications just set up a nice large stripe. |
#17
|
|||
|
|||
|
#18
|
|||
|
|||
Malcolm Weir wrote:
On 28 Nov 2003 23:04:02 -0800, (Robert Wessel) wrote: Someone did have a RAID2 system, using 37 disks: 32 data plus 5 ECC, and a 16KB "native" block size. It was one of the HPC manufactures (Thinking Machines, perhaps). That's what my admittedly foggy memory says as well ;-) -- Nik Simpson |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
15K rpm SCSI-disk | Ronny Mandal | General | 26 | December 8th 04 08:04 PM |
Newbie Question re hardware vs software RAID | Gilgamesh | General | 44 | November 22nd 04 10:52 PM |
asus p2b-ds and scsi (from a scsi newbie) | [email protected] | Asus Motherboards | 8 | May 30th 04 09:43 AM |
120 gb is the Largest hard drive I can put in my 4550? | David H. Lipman | Dell Computers | 65 | December 11th 03 01:51 PM |
newb questions about SCSI hard drives | fred.do | Homebuilt PC's | 7 | June 26th 03 01:59 AM |