IOPS from RAID units

**davidhoffer** · #1 March 20th 08, 11:44 PM posted to comp.arch.storage

The IOPS the array can deliver is the aggregate of all the drives in the raid group less the hot spares and parity drives. Its more complicated than that of course but its a not bad rule of thumb. So 10 data drives that can each deliver 100 iops could (in theory) deliver 1000 iops when bound together into a raid group. If you want to know what performance you will get out of a SPECIFIC array with SPECIFIC applications, that can be calculated more accurately....at great expense of course.

+----------------------------------------------------------------------
|This was sent by via Backup Central.
|Forward SPAM to .
+----------------------------------------------------------------------

**Faeandar** · #2 March 21st 08, 06:46 AM posted to comp.arch.storage

On Thu, 20 Mar 2008 19:44:27 -0400, davidhoffer
wrote:

The IOPS the array can deliver is the aggregate of all the drives in the raid group less the hot spares and parity drives. Its more complicated than that of course but its a not bad rule of thumb. So 10 data drives that can each deliver 100 iops could (in theory) deliver 1000 iops when bound together into a raid group. If you want to know what performance you will get out of a SPECIFIC array with SPECIFIC applications, that can be calculated more accurately....at great expense of course.

+----------------------------------------------------------------------
|This was sent by via Backup Central.
|Forward SPAM to .
+----------------------------------------------------------------------

Was there a question in there somewhere and I just missed it?

~F

**Cydrome Leader** · #3 March 21st 08, 04:15 PM posted to comp.arch.storage

davidhoffer wrote: The
IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

we're looking at negative performance gains here in addition to an
incredible increase of operations inside that raid group.

together into a raid group. If you want to know what performance you will
get out of a SPECIFIC array with SPECIFIC applications, that can be
calculated more accurately....at great expense of course.

calculating performance is just stupid. Run benchmarks and see what really
happens as that's what counts.

**[email protected]** · #4 March 21st 08, 08:59 PM posted to comp.arch.storage

On Mar 21, 11:15 am, Cydrome Leader wrote:
davidhoffer wrote: The

IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

Not it's not. Assuming your write doesn't span a stripe or block,
it's two reads and two writes, assuming nothing is cached.

Note that a much larger write (the size of a stripe and so aligned),
can be done with a single set of writes.

we're looking at negative performance gains here in addition to an
incredible increase of operations inside that raid group.

No, it's two reads and two writes no matter how may drives in the
RAID5 array.

together into a raid group. If you want to know what performance you will
get out of a SPECIFIC array with SPECIFIC applications, that can be
calculated more accurately....at great expense of course.

calculating performance is just stupid. Run benchmarks and see what really
happens as that's what counts.

While it's difficult to do with great precision, some of us do like to
do a bit of planning before buying the expensive storage box.
Seriously, "just buy the DS8000, and then we'll figure out if it's the
right size" isn't going to fly at a lot of places. And while IBM will
likely help me do some benchmarking of a DS8000 before I buy it, the
process in neither easy of cheap, so I really don't want to be doing
it too much, and I want to use the result of that only for fine
tuning.

If I can get projected (uncached) read and write rates from my
database, using W*4+R as my baseline IOPS requirement is a pretty
reasonable first order approximation.

**Cydrome Leader** · #5 March 23rd 08, 12:18 AM posted to comp.arch.storage

wrote:
On Mar 21, 11:15 am, Cydrome Leader wrote:
davidhoffer wrote: The

IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

Not it's not. Assuming your write doesn't span a stripe or block,

It doesn't matter what it spans or how small it is. Everything gets
rewritten. that's why raid5 is crap for write performance and especially
small writes.

it's two reads and two writes, assuming nothing is cached.

Note that a much larger write (the size of a stripe and so aligned),
can be done with a single set of writes.

As the write size approaches the stripe size, the ratio of overhead to
actual writes (the data the user sees and cares about) starts to get
closer to 1, but it still doesn't change that there's all sorts of busy
work going on.

we're looking at negative performance gains here in addition to an
incredible increase of operations inside that raid group.

No, it's two reads and two writes no matter how may drives in the
RAID5 array.

Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
stripe size and 512byte blocks. Please explain how that array gets updated
with only two writes.

together into a raid group. If you want to know what performance you will
get out of a SPECIFIC array with SPECIFIC applications, that can be
calculated more accurately....at great expense of course.

calculating performance is just stupid. Run benchmarks and see what really
happens as that's what counts.

While it's difficult to do with great precision, some of us do like to
do a bit of planning before buying the expensive storage box.
Seriously, "just buy the DS8000, and then we'll figure out if it's the
right size" isn't going to fly at a lot of places. And while IBM will
likely help me do some benchmarking of a DS8000 before I buy it, the
process in neither easy of cheap, so I really don't want to be doing

If you're too lazy and cheap to do testing of an expensive storage unit,
maybe you shouldn't be the person testing and deciding what you buy in the
first place.

Considering the ds8000 series is so costly that you can't even buy it now
off the IBM site, it's pretty reasonable to assume IBM will jump through
some hoops to sell you one, even if that involves testing one out.

If I can get projected (uncached) read and write rates from my
database, using W*4+R as my baseline IOPS requirement is a pretty
reasonable first order approximation.

Which still has little bearing on what's going to happen in the real
world. that's why we test things out to cut though the sales sheet
bull****.

**Bill Todd** · #6 March 23rd 08, 04:33 AM posted to comp.arch.storage

Cydrome Leader wrote:
wrote:
On Mar 21, 11:15 am, Cydrome Leader wrote:
davidhoffer wrote: The

IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

For many common situations (specifically, where reads constitute most of
the workload and the request sizes are much smaller than the array
stripe segment size) in which IOPS are important (i.e., workloads
dominated by many small accesses: otherwise, bandwidth starts being the
primary concern), it's actually a pretty good assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

Not it's not. Assuming your write doesn't span a stripe or block,

It doesn't matter what it spans or how small it is. Everything gets
rewritten. that's why raid5 is crap for write performance and especially
small writes.

Some people who don't have a clue at least have the sense to shut up
when corrected. Obviously, though, you're not one of them.

it's two reads and two writes, assuming nothing is cached.

100% correct, unless you're unlucky enough to find that the 5 bytes
cross a stripe-segment boundary (with the 64 KB segment size used as an
example later, the chances of this are a bit under 0.01%) in which case
it'll be three reads and three writes (unless the array size is small
enough to make another strategy more efficient).

....

No, it's two reads and two writes no matter how may drives in the
RAID5 array.

Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
stripe size and 512byte blocks. Please explain how that array gets updated
with only two writes.

Leaving aside the slightly under 1% chance that the 513 bytes happen to
span a stripe segment boundary, you read the two sectors affected by the
update (if they're not already cached, which given that you're updating
them is quite likely), you read the corresponding two parity sectors
from the stripe's parity segment, you XOR the original 513 bytes with
the new bytes, you XOR the result into the corresponding bytes in the
parity segment in memory, you overwrite the original 513 bytes in the
data segment in memory with the changed data, you write back the two
sectors of the data segment, and you write back the two sectors of the
parity segment.

One or at most two small reads, plus two small writes. I suspect that
your understanding of how RAID-5 actually functions is seriously flawed,
but here's at least a start on correcting that situation (which of
course you should have attempted before making a fool of yourself, but
better late than never).

together into a raid group. If you want to know what performance you will
get out of a SPECIFIC array with SPECIFIC applications, that can be
calculated more accurately....at great expense of course.

calculating performance is just stupid.

Actually, *not* making serious attempts to estimate performance is
stupid: not only do benchmarks (even those run with your own
application) often fail to uncover important corner cases, but comparing
your estimates with the benchmarks validates your own understanding of
your hardware and its relation to your workload (or, perhaps even more
importantly, exposes gaps in it).

....

While it's difficult to do with great precision, some of us do like to
do a bit of planning before buying the expensive storage box.
Seriously, "just buy the DS8000, and then we'll figure out if it's the
right size" isn't going to fly at a lot of places. And while IBM will
likely help me do some benchmarking of a DS8000 before I buy it, the
process in neither easy of cheap, so I really don't want to be doing

If you're too lazy and cheap to do testing of an expensive storage unit,
maybe you shouldn't be the person testing and deciding what you buy in the
first place.

Hmmm - you hardly sound like someone competent to be making judgments in
this area: let's hope that your employer doesn't drop in here often.

- bill

**Cydrome Leader** · #7 March 23rd 08, 08:33 PM posted to comp.arch.storage

Bill Todd wrote:
Cydrome Leader wrote:
wrote:
On Mar 21, 11:15 am, Cydrome Leader wrote:
davidhoffer wrote: The

IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

For many common situations (specifically, where reads constitute most of
the workload and the request sizes are much smaller than the array
stripe segment size) in which IOPS are important (i.e., workloads
dominated by many small accesses: otherwise, bandwidth starts being the
primary concern), it's actually a pretty good assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

Not it's not. Assuming your write doesn't span a stripe or block,

It doesn't matter what it spans or how small it is. Everything gets
rewritten. that's why raid5 is crap for write performance and especially
small writes.

Some people who don't have a clue at least have the sense to shut up
when corrected. Obviously, though, you're not one of them.

it's two reads and two writes, assuming nothing is cached.

100% correct, unless you're unlucky enough to find that the 5 bytes
cross a stripe-segment boundary (with the 64 KB segment size used as an
example later, the chances of this are a bit under 0.01%) in which case
it'll be three reads and three writes (unless the array size is small
enough to make another strategy more efficient).

...

No, it's two reads and two writes no matter how may drives in the
RAID5 array.

Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
stripe size and 512byte blocks. Please explain how that array gets updated
with only two writes.

Leaving aside the slightly under 1% chance that the 513 bytes happen to
span a stripe segment boundary, you read the two sectors affected by the
update (if they're not already cached, which given that you're updating
them is quite likely), you read the corresponding two parity sectors
from the stripe's parity segment, you XOR the original 513 bytes with
the new bytes, you XOR the result into the corresponding bytes in the
parity segment in memory, you overwrite the original 513 bytes in the
data segment in memory with the changed data, you write back the two
sectors of the data segment, and you write back the two sectors of the
parity segment.

One or at most two small reads, plus two small writes. I suspect that
your understanding of how RAID-5 actually functions is seriously flawed,
but here's at least a start on correcting that situation (which of
course you should have attempted before making a fool of yourself, but
better late than never).

Unless you're some raid salesperson, who uses different definitions of
"operations" between the host and raid controller and then something else
between the raid controller and the disks themselves, I'm not following.

So try again.

I alter 513 bytes on a raid 5 array, overwriting existing data. That's
two 512 byte writes for the host.

How many writes take place between that raid controller and the disks
themselves, in 512 byte writes to any disk in that array?

**Bill Todd** · #8 March 24th 08, 01:26 AM posted to comp.arch.storage

Cydrome Leader wrote:

....

Unless you're some raid salesperson, who uses different definitions of
"operations" between the host and raid controller and then something else
between the raid controller and the disks themselves, I'm not following.

That's your problem. I'd suggest finding a tutor, if you can't learn
how a conventional RAID-5 array works from the description that I
provided or from other easily-accessible on-line resources.

- bill

**Cydrome Leader** · #9 March 24th 08, 04:04 PM posted to comp.arch.storage

Bill Todd wrote:
Cydrome Leader wrote:

...

Unless you're some raid salesperson, who uses different definitions of
"operations" between the host and raid controller and then something else
between the raid controller and the disks themselves, I'm not following.

That's your problem. I'd suggest finding a tutor, if you can't learn
how a conventional RAID-5 array works from the description that I
provided or from other easily-accessible on-line resources.

- bill

I noticed you cut out my question, probably on purpose.

I'll ask again.

I want to change two blocks on a raid5 array, the host is changing 513
bytes, so it's really doing two writes of 512bytes each.

you and your buddy state that it only takes two write to do this on raid5.

So, can you explain how it only takes two 512byte writes to update data on
a raid5 aray where the change in user data is also two 512byes?

**[email protected]** · #10 March 24th 08, 06:31 PM posted to comp.arch.storage

On Mar 22, 7:18*pm, Cydrome Leader wrote:
wrote:
On Mar 21, 11:15 am, Cydrome Leader wrote:
davidhoffer wrote: The

IOPS the array can deliver is the aggregate of all the drives in the raid
group less the hot spares and parity drives. *Its more complicated than
that of course but its a not bad rule of thumb. So 10 data drives that can
each deliver 100 iops could (in theory) deliver 1000 iops when bound

this is a bad assumption.

let me ruin this example.

I write 5 bytes to a raid5 array of 10 drives.

that's one operation to the host.

it's also at least 10 reads and 10 writes if you're using 10 drives, and
probably more depending on the stripe size of the raid array.

Not it's not. *Assuming your write doesn't span a stripe or block,

It doesn't matter what it spans or how small it is. Everything gets
rewritten. that's why raid5 is crap for write performance and especially
small writes.

it's two reads and two writes, assuming nothing is cached.

Note that a much larger write (the size of a stripe and so aligned),
can be done with a single set of writes.

As the write size approaches the stripe size, the ratio of overhead to
actual writes (the data the user sees and cares about) starts to get
closer to 1, but it still doesn't change that there's all sorts of busy
work going on.

we're looking at negative performance gains here in addition to an
incredible increase of operations inside that raid group.

No, it's two reads and two writes no matter how may drives in the
RAID5 array.

Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
stripe size and 512byte blocks. Please explain how that array gets updated
with only two writes.

Your original example involved a five byte write, and my response was
clearly in the context of small random writes like that.

Let's see, 512 byte blocks, and a 64K stripe... OK, a RAID-5 array
with 129 disks in it. An unusual configuration to be sure, but we'll
run with it.

Assuming this stays within the stripe (the array isn't always laid out
that way), the update process then involves three reads, and three
writes. The two old blocks and the old parity block are read, and
then the updated blocks and recomputed parity block are written back.
Which is actually better (3X) than the (roughly) 4X performance hit
that a single block update incurs.

A larger sequential write (which you've done by specifying a 513 byte
write with 512 byte blocks) can be done with rather less overhead,
approaching 2X as you near the stripe size, and falling to a little
over 1X (writes only for the data plus a parity block) when the write
covers an entire stripe.

OTOH, assuming you mean something more reasonable like 4K blocks on a
64K stripe and the usually 512 byte *sectors* (although that's still
pretty darn small for both dimensions), your two sector update
requires two reads of sequential pairs of sectors (the old data pair,
and the matching old parity pair), and then a pair of two sector
sequential writes. Assuming of course you don't span the block or
stripe boundary. And given the almost non-existent overhead of
reading or writing a sequential pair of sectors compared to reading a
single sector, that's invariable counted as a single I/O, and not
two. Not to mention that reads, at least, are almost always done in
bigger units than a sector anyway.

But in the end, your basic small random write requires two physical
reads and two physical writes, assuming no caching, no matter how many
disks in the RAID-5 array.

While it's difficult to do with great precision, some of us do like to
do a bit of planning before buying the expensive storage box.
Seriously, "just buy the DS8000, and then we'll figure out if it's the
right size" isn't going to fly at a lot of places. *And while IBM will
likely help me do some benchmarking of a DS8000 before I buy it, the
process in neither easy of cheap, so I really don't want to be doing

If you're too lazy and cheap to do testing of an expensive storage unit,
maybe you shouldn't be the person testing and deciding what you buy in the
first place.

Considering the ds8000 series is so costly that you can't even buy it now
off the IBM site, it's pretty reasonable to assume IBM will jump through
some hoops to sell you one, even if that involves testing one out.

Sure, IBM will jump through hoops, but do you know how much work it is
to set up a realistic set of benchmarks for a complex system at an IBM
facility (or even if they ship you one to play with for a month at
your site)? And that's not work IBM can do. And as I stated in the
part you chose not to quote, that's not something I want to do too
much of, and I want to be pretty close before I start, and want to use
the "real" benchmark results only for fine tuning the configuration.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IOPS calculation for SAN Design	[email protected]	Storage & Hardrives	4	October 21st 07 02:41 PM
IOPS from RAID units	[email protected]	Storage & Hardrives	8	July 21st 07 04:20 PM
nfs ops to iops formula?	Faeandar	Storage & Hardrives	7	February 23rd 06 04:18 AM
Base Units	Gordon	UK Computer Vendors	3	October 20th 04 11:12 PM
FS: UV Exposure Units.	[email protected]	Homebuilt PC's	1	December 7th 03 12:29 AM