A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Newbie storage questions... (RAID5, SANs, SCSI)



 
 
Thread Tools Display Modes
  #1  
Old November 26th 03, 10:20 PM
David Sworder
external usenet poster
 
Posts: n/a
Default Newbie storage questions... (RAID5, SANs, SCSI)

Hi,

I'm reading a book that describes how to plan an SQL Server
installation. The book warns that one should never use RAID5 unless the
volume receives less than 10% writes (i.e. 90% reads). Apparently the
performance penalty for data writes is quite high with RAID5 but I'm having
trouble understanding exactly what the penalty is. Consider the following
example:

- Let's say it takes x seconds to write a chunk of data to a single hard
drive in a single IO operation.

- Now let's say that I have 3 of these drives in a RAID5 array and I
want to write the same chunk of data. Instead of using a single IO
operation, four operations are now involved because to write a bit of data
to a drive, the RAID controller must:
1) read the preexisting bit of data on the drive
2) read the preexisting bit of data on the parity drive
...calculate the new parity bit and then...
3) write the new bit of data to the drive
4) write the new parity bit to the parity drive.
...although there are now 4 operations, the operations are spread over 3
drives. So the time to perform this operation is [4x/3] seconds, that is to
say, 1.33x seconds or 33% longer than it would take to write to a single
drive. Using this logic, if I had 4 drives the write speed would be
identical to writing to a single drive. Only when I have more than four
drives is the write time of the RAID5 volume superior to writing to a single
drive. Is this logic correct? This is how my book describes it, but in
practice RAID5 doesn't seem to be as slow as this. Could someone please
confirm?

Now let me ask you a question about bandwidth for data transfer from
hard drives. Consider the following device that Dell sells which appears to
be a stand-alone rack-mountable RAID device:


http://www1.us.dell.com/content/prod...555&l=en&s=biz

This thing costs nearly $12,000, but yet its bandwidth of 200MB/sec is
inferior to standard SCSI which is 320MB/sec. My question is: What is the
advantage of this device over traditional SCSI RAID?

One last question: The SQL Server seems to be a single point of failure.
If the motherboard or power supply in the SQL Server machine goes down, my
entire application will go down. Is it possible to set things up in such a
way that TWO machines running SQL Server can be attached to the same
harddisk? If one SQL machine dies, the other machine automatically comes
online? In this scenario, where is the harddrive located? It can't be in one
of the machines because if that machine were the one that crashed, the other
SQL Server machine would be unable to access that drive! That's why I'm
wondering if that Dell external RAID solution (see above) might be
appropriate for me. What do you think?

Thanks,

David



  #2  
Old November 26th 03, 11:16 PM
Nik Simpson
external usenet poster
 
Posts: n/a
Default

David Sworder wrote:
Hi,

I'm reading a book that describes how to plan an SQL Server
installation. The book warns that one should never use RAID5 unless
the volume receives less than 10% writes (i.e. 90% reads).
Apparently the performance penalty for data writes is quite high with
RAID5 but I'm having trouble understanding exactly what the penalty
is. Consider the following example:


A lot of the "don't use RAID5" for databases relates to software RAID
implemntations on the host where instead of performing a single I/O
operation the host has to perform the multiple operations and also to
calculate the parity. If you offload the RAID functions an external
subsystem (especially a relativiely modern one with plenty of write-back
cache) the write performce of RAID5 will almost certanly be more than
adequate for your needs. The caveat of "almost" is there because you don't
mention any specific performance requirements.

This is how my book describes it, but in practice RAID5 doesn't seem
to be as slow as this. Could someone please confirm?


RAID-5 will certainly be slower than say RAID 0+1, it's just the nature of
the beast, but its performance is more than adequate for an awful lot of
applications, otherwise it wouldn't be so popular. A common approach with
databases is to put the more write intensive portions (like transaction
logs) onto a less write-sensitive RAID device (say mirror a couple of drives
and use those for the transaction log device) and use RAID5 for the less
write intensive data tables.


Now let me ask you a question about bandwidth for data transfer
from hard drives. Consider the following device that Dell sells which
appears to be a stand-alone rack-mountable RAID device:



http://www1.us.dell.com/content/prod...555&l=en&s=biz

This thing costs nearly $12,000, but yet its bandwidth of
200MB/sec is inferior to standard SCSI which is 320MB/sec. My
question is: What is the advantage of this device over traditional
SCSI RAID?


First, that 200MB/s is misleading, FC is a duplex protocol, it can do
200MB/s in each direction simulataneously, so it's really more like 400MB/s.
Second in a database application, you'll almost certainly never see 200MB/s
of throughput, let alone 320MB/s so raw throughput is a poor measure, what
you are really interested in is I/O operations/sec and this is the Fibre
channel protocol is much more efficient in terms of how it uses the
available bandwidth becuase it can setup and tear down an I/O transaction
much more quickly than typical SCSI allowing it to handle more
transactions/sec for a given amount of bandwidth.

There are other benefits to using FC in terms of clustering and other
advanced functions which brings me to...


One last question: The SQL Server seems to be a single point of
failure. If the motherboard or power supply in the SQL Server machine
goes down, my entire application will go down. Is it possible to set
things up in such a way that TWO machines running SQL Server can be
attached to the same harddisk? If one SQL machine dies, the other
machine automatically comes online? In this scenario, where is the
harddrive located? It can't be in one of the machines because if that
machine were the one that crashed, the other SQL Server machine would
be unable to access that drive! That's why I'm wondering if that Dell
external RAID solution (see above) might be appropriate for me. What
do you think?


Micrsoft has a multinode clustering function for its operating systems and
applications to address just this problem, do a search on Microsoft's
website for clustering, you'll find plenty of information. But in order to
cluster, you'll need a shareable subsystem and today that pretty much mean a
Fibre channel attached subsystem like the one you mentioned earlier.
Theoretically, you can do the same with SCSI, but's its usually ugly and I
don't if MS still supports any external SCSI subsystems for clustering
applications.


--
Nik Simpson


  #3  
Old November 27th 03, 04:51 AM
David Sworder
external usenet poster
 
Posts: n/a
Default

First, that 200MB/s is misleading, FC is a duplex protocol, it can do
200MB/s in each direction simulataneously, so it's really more like

400MB/s.
Second in a database application, you'll almost certainly never see

200MB/s
of throughput, let alone 320MB/s so raw throughput is a poor measure, what
you are really interested in is I/O operations/sec and this is the Fibre
channel protocol is much more efficient in terms of how it uses the
available bandwidth becuase it can setup and tear down an I/O transaction
much more quickly than typical SCSI allowing it to handle more
transactions/sec for a given amount of bandwidth.


Thanks Nik,

That was a very helpful post. After reading some of your other posts on
this newsgroup (via Google), it's clear to me that I/O operations/sec are
more important to me than total bandwidth. What's odd is that the SCSI
manufacturers such as Adaptec don't list IOs/sec on their spec sheets which
makes it difficult to compare apples to apples. How would one go about
finding the IOs/sec for a SCSI RAID card? That fibre channel unit that I
mentioned in my original post handles 40,000 IOs/sec according to the Dell
site. I'd like to see how high-end SCSI RAID cards compare.

The next logical question in this little learning exercise of mine is:
What exactly defines an IO operation? Maybe an example would clarify my
question...

Let's say I have a machine with one hard drive. SQL Server runs on that
machine and attempts to read 4,000 bytes of data sequentially from the
drive. This would count as one IO operation correct? Now let's change the
situation so that instead of one harddrive, I have that cool fibre channel
device that I mentioned in my previous post. It has 10 drives configured as
RAID 1+0. In this situation, when SQL Server attempts to read 4,000 bytes,
Windows still thinks of this as 1 IO operation -- but does the FC device
consider this to be 1 operation? In order to read 4,000 bytes, the FC device
is accessing all 10 disks simultaneously. So is this operation considered as
*ten* IOs, or only one?

I'm asking this question because my books states that I should closely
monitor the IOs/second in 'perfmon' and make sure that it does not exceed
85% of the maximum throughput. So if the FC RAID device above supports a
maximum of 40,000 IOs/second, I'd want to make sure that I don't exceed
34,000 IOs/second on a regular basis. Tracking IOs/second is easy using the
Windows PerfMonitor, but I'm not sure if PerfMon is tracking the correct
value since Windows has no way of knowing that each IO read/write will
trigger multiple read/writes across the various drives in the array. Do you
see what I mean? So what exactly defines an "IO" on a RAID device, be it a
SCSI RAID or an FC RAID?

Thanks,

David





  #4  
Old November 27th 03, 09:22 AM
Robert Wessel
external usenet poster
 
Posts: n/a
Default

"David Sworder" wrote in message ...
That was a very helpful post. After reading some of your other posts on
this newsgroup (via Google), it's clear to me that I/O operations/sec are
more important to me than total bandwidth. What's odd is that the SCSI
manufacturers such as Adaptec don't list IOs/sec on their spec sheets which
makes it difficult to compare apples to apples. How would one go about
finding the IOs/sec for a SCSI RAID card? That fibre channel unit that I
mentioned in my original post handles 40,000 IOs/sec according to the Dell
site. I'd like to see how high-end SCSI RAID cards compare.



It's usually a non-issue. You're grossly limited by the drive
subsystem, which will rarely allow more than a few hundred random I/Os
per second. Sequential I/O can run significantly faster, but only
rarely do you do long sequential writes on a database. On sequential
reads (for instance during a table scan), most SCSI (RAID or
otherwise) can pretty well keep up with the disk drives.

Just to put a number on things, let's say you've got a RAID 5 array of
five drives with 5ms typical access time (pretty optimistic). So each
drive can do about 200 I/Os per second. So you could sustain
something like 1000 random reads per second (with zero writes), or 250
writes (with zero reads). Or at something like 80% reads, a total of
625 (mixed) I/Os per second.

With extensive caching at the RAID controller you can get higher
numbers, but on modest sized systems like the one you're discussing,
it's almost always a better idea to put extra cache into the server
instead of the disk subsystem.

FC HBAs are often attached to very large disk arrays (sometimes
thousands of drives), where the number of I/Os per second the *HBA*
can sustain can become a limiting factor. It's rare to see a SCSI
RAID subsystem that can support more than a couple of dozen drives
which puts a pretty low upper limit on the number of random I/Os per
second. In any event, vendors quote raw I/Os per second for FC HBAs
because that's all they can measure, as there's no disk involved. The
vendor of the disk subsystem that you attach to you FC HBA will have a
set of performance numbers, those can be compared to the SCSI RAID
controller performance figures.

On a small system, the FC HBA will simply not be the bottleneck for
random I/Os, the drive array *will* be the bottleneck. It doesn't
matter that your HBA can do 40,000 I/Os per second if it's only
talking to a drive subsystem with a dozen disks that can hit 1500 IO/s
only with a tailwind. OTOH, hang 30 of those subsystems on your SAN,
and you're going to be limited by the HBA (assuming you've only got
the one host).


The next logical question in this little learning exercise of mine is:
What exactly defines an IO operation? Maybe an example would clarify my
question...

Let's say I have a machine with one hard drive. SQL Server runs on that
machine and attempts to read 4,000 bytes of data sequentially



SQL Server (and most other DBMSs) will typically format his datasets
in blocks that are some power-of-2 multiple of 4KB. He'll then read
or write those blocks as units (or in groups of blocks if he can).


from the
drive. This would count as one IO operation correct? Now let's change the
situation so that instead of one harddrive, I have that cool fibre channel
device that I mentioned in my previous post. It has 10 drives configured as
RAID 1+0. In this situation, when SQL Server attempts to read 4,000 bytes,
Windows still thinks of this as 1 IO operation -- but does the FC device
consider this to be 1 operation? In order to read 4,000 bytes, the FC device
is accessing all 10 disks simultaneously. So is this operation considered as
*ten* IOs, or only one?

I'm asking this question because my books states that I should closely
monitor the IOs/second in 'perfmon' and make sure that it does not exceed
85% of the maximum throughput. So if the FC RAID device above supports a
maximum of 40,000 IOs/second, I'd want to make sure that I don't exceed
34,000 IOs/second on a regular basis. Tracking IOs/second is easy using the
Windows PerfMonitor, but I'm not sure if PerfMon is tracking the correct
value since Windows has no way of knowing that each IO read/write will
trigger multiple read/writes across the various drives in the array. Do you
see what I mean? So what exactly defines an "IO" on a RAID device, be it a
SCSI RAID or an FC RAID?



Loosely, an I/O is a single read or write operation. The size is
context dependent, but in the case of a DBMS it's going to typically
be the page size for the table or table space. If you've got hardware
RAID, the host will see a single I/O for either a read or a write.
The RAID controller will issue multiple I/Os to the attached devices
as necessary. For example, let's say you have a FC HBA in the host
connected to a RAID disk subsystem that's got a bunch of disk drives
on an internal SCSI channel. Your database writes a disk page. There
will be a single I/O across the fiber from the host to the RAID
controller, and then (assuming no fortuitous caching) *four* I/Os
across the internal SCSI channel from the RAID controller to the disk
drives.
  #5  
Old November 27th 03, 05:20 PM
David Sworder
external usenet poster
 
Posts: n/a
Default

This is really great information. I apologize for the basic questions,
but I've only been examining this stuff for the better part of one day.
Let me ask you a few follow ups...

Just to put a number on things, let's say you've got a RAID 5 array of
five drives with 5ms typical access time (pretty optimistic). So each
drive can do about 200 I/Os per second. So you could sustain
something like 1000 random reads per second (with zero writes)....


I don't quite understand this concept. You've got five drives, each of
which can handle 200 I/Os per second. You're multiplying 5*200 to get 1000
IOPs for the array. I understand your calculation but I'm not sure why it
works as you state. In a trivial example, let's say the RAID controller is
instrutcted to read 5 bytes of data. This is considered one IO by the RAID
controller, but doesn't the RAID controller then have to issue *5* read
commands, one to each disk? My understanding of RAID (as it applies to
reading data) is that the 5 disks would always be accessed simultaneously in
order to speed up the read process. So for each IO read-request that the
RAID controller receives, it has to issue 5 IO requests, one to each drive.
So it seems that the RAID controller would *still* be limited to 200 IOPs,
regardless of how many drives on are on the array. Why is it that you say
the reality of the situation is that the RAID controller can actually handle
1000 IOPs? I don't understand.

With extensive caching at the RAID controller you can get higher
numbers, but on modest sized systems like the one you're discussing,
it's almost always a better idea to put extra cache into the server
instead of the disk subsystem.


When you say that the extra cache should be put in the server but not on
the RAID controller or disk subsystem, what do you mean exactly? Where in
the server would I want to increase the cache?

FC HBAs are often attached to very large disk arrays (sometimes
thousands of drives), where the number of I/Os per second the *HBA*
can sustain can become a limiting factor. It's rare to see a SCSI
RAID subsystem that can support more than a couple of dozen drives
which puts a pretty low upper limit on the number of random I/Os per
second. In any event, vendors quote raw I/Os per second for FC HBAs
because that's all they can measure, as there's no disk involved. The
vendor of the disk subsystem that you attach to you FC HBA will have a
set of performance numbers, those can be compared to the SCSI RAID
controller performance figures.


Ah, ok... This clarifies things a bit. I think I had a fundamental
understanding of what an HBA is. So an HBA is a "host bus adapter." It lives
in the server [in a PCI slot I assume]. This HBA has no idea how many drives
are in the array. I just passes I/O requests over a 2GB/s fibre cable. It
can pass up to 40,000 of these requests/second [using the example from the
Dell site in my previous post]. At the other end of the cable is that
rack-mountable box containing all of the drives. Are you saying that the
real brains of the RAID lies within that box instead of the HBA card? So I
really need to be asking myself "how many IOPs can those drives handle"
because it's the IOPs limitation of the DRIVES, not the HBA card, that is my
bottleneck. Is this correct?

Loosely, an I/O is a single read or write operation.[...]


I think I understand your explanation, but again, see my first question
above. In the simpler case of doing a single *read* operation, is the single
I/O request actually morphed into X number of requests where X is the number
of drives in the array since each drive will have to be touched in order to
perform the read?

David


  #6  
Old November 27th 03, 05:58 PM
Hans Jørgen Jakobsen
external usenet poster
 
Posts: n/a
Default

On Thu, 27 Nov 2003 16:20:23 GMT, David Sworder wrote:
This is really great information. I apologize for the basic questions,
but I've only been examining this stuff for the better part of one day.
Let me ask you a few follow ups...

Just to put a number on things, let's say you've got a RAID 5 array of
five drives with 5ms typical access time (pretty optimistic). So each
drive can do about 200 I/Os per second. So you could sustain
something like 1000 random reads per second (with zero writes)....


I don't quite understand this concept. You've got five drives, each of
which can handle 200 I/Os per second. You're multiplying 5*200 to get 1000
IOPs for the array. I understand your calculation but I'm not sure why it
works as you state. In a trivial example, let's say the RAID controller is
instrutcted to read 5 bytes of data. This is considered one IO by the RAID
controller, but doesn't the RAID controller then have to issue *5* read
commands, one to each disk? My understanding of RAID (as it applies to
reading data) is that the 5 disks would always be accessed simultaneously in
order to speed up the read process. So for each IO read-request that the
RAID controller receives, it has to issue 5 IO requests, one to each drive.
So it seems that the RAID controller would *still* be limited to 200 IOPs,
regardless of how many drives on are on the array. Why is it that you say
the reality of the situation is that the RAID controller can actually handle
1000 IOPs? I don't understand.

Here comes the term "stripe size".
This is the number of consequtive bytes allocated on the same disc.
Depending on your performance requirement you will chose a small or large
stripe size. (8k-64k or even much larger)

/hjj
  #7  
Old November 27th 03, 06:14 PM
David Sworder
external usenet poster
 
Posts: n/a
Default

Here comes the term "stripe size".
This is the number of consequtive bytes allocated on the same disc.
Depending on your performance requirement you will chose a small or large
stripe size. (8k-64k or even much larger)


Ha! Just when I think I'm beginning to get a handle on things, a new
term/concept comes along that reveals just how ignorant I really was (am).


Ok... "stripe size"... So in a RAID array of 5 disks with a stripe size of
8k, if I submit a request to the RAID controller to write 5,000 bytes, these
bytes will not be scattered equally across all drives? Since the size of the
data being written is less than the stripe size, all of the data could
conceivably written to one disk?


  #8  
Old November 27th 03, 06:42 PM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"David Sworder" wrote in message
...
This is really great information. I apologize for the basic questions,
but I've only been examining this stuff for the better part of one day.
Let me ask you a few follow ups...

Just to put a number on things, let's say you've got a RAID 5 array of
five drives with 5ms typical access time (pretty optimistic). So each
drive can do about 200 I/Os per second. So you could sustain
something like 1000 random reads per second (with zero writes)....


Just to be a bit more complete:

As Robert noted, 5 ms. for an average single random access is a bit
optimistic: the fastest current 15Krpm drives take about 5.5 ms., 10Krpm
drives take more like 7 - 8 ms., and 7200 rpm ATA drives take 12 - 13 ms.

However, that's for requests submitted serially, such that one request is
satisfied before the next is submitted. If the workload performs many tasks
in parallel such that multiple requests can be submitted without waiting for
any to complete (as FC and SCSI disks allow but most ATA disk to not - yet),
the average latency goes up (because all but the first one satisfied is
waiting in a queue) but the throughput does as well (because the disk can
pick an optimal order in which to satisfy them that minimizes the latency
betweeen them): if your request stream has sufficient parallelism, the
throughput of an individual disk can easily double - though each request
will sustain on average much more latency than it would in a serial stream,
so if individual response times are critical spreading the requests across a
larger array will improve it even though the per-disk throughput will
decrease.


I don't quite understand this concept. You've got five drives, each of
which can handle 200 I/Os per second. You're multiplying 5*200 to get 1000
IOPs for the array. I understand your calculation but I'm not sure why it
works as you state. In a trivial example, let's say the RAID controller is
instrutcted to read 5 bytes of data. This is considered one IO by the RAID
controller, but doesn't the RAID controller then have to issue *5* read
commands, one to each disk? My understanding of RAID (as it applies to
reading data) is that the 5 disks would always be accessed simultaneously

in
order to speed up the read process. So for each IO read-request that the
RAID controller receives, it has to issue 5 IO requests, one to each

drive.
So it seems that the RAID controller would *still* be limited to 200 IOPs,
regardless of how many drives on are on the array. Why is it that you say
the reality of the situation is that the RAID controller can actually

handle
1000 IOPs? I don't understand.


As already noted, most RAID implementations do not work this way: instead,
data is spread across the disks in the array in coarser chunks - usually no
smaller than 4 KB per disk, often 64 KB per disk, and there are good reasons
in most workloads to make them even larger. Some early implementations of
RAID-3 distributed the data at finer grain (much as you describe above), but
I've never heard of RAID-0, -1, -4, or -5 doing so.


With extensive caching at the RAID controller you can get higher
numbers, but on modest sized systems like the one you're discussing,
it's almost always a better idea to put extra cache into the server
instead of the disk subsystem.


When you say that the extra cache should be put in the server but not

on
the RAID controller or disk subsystem, what do you mean exactly? Where in
the server would I want to increase the cache?


Just adding server RAM will normally suffice: the operating system should
put it to good use caching data for most workloads, though a few (those that
perform lots of small writes and require that each complete before the next
is submitted) might better benefit from cache in the array controller.

Having *some* cache in the controller that allows it to defer disk writes
until a convenient opportunity (and hence significantly decrease their
overhead) is desirable, though. It must be non-volatile (such that its
contents aren't lost if power fails: some people trust a simple external
UPS to suffice here, but having a back-up battery right on the array cache
card tends to be safer), and to provide safety equivalent to the RAID array
behind it it really needs to be duplicated (otherwise, it becomes a single
point of failure).

- bill



  #9  
Old November 27th 03, 06:52 PM
Nik Simpson
external usenet poster
 
Posts: n/a
Default

David Sworder wrote:
Here comes the term "stripe size".
This is the number of consequtive bytes allocated on the same disc.
Depending on your performance requirement you will chose a small or
large stripe size. (8k-64k or even much larger)


Ha! Just when I think I'm beginning to get a handle on things, a new
term/concept comes along that reveals just how ignorant I really was
(am).

Ok... "stripe size"... So in a RAID array of 5 disks with a stripe
size of 8k, if I submit a request to the RAID controller to write
5,000 bytes, these bytes will not be scattered equally across all
drives? Since the size of the data being written is less than the
stripe size, all of the data could conceivably written to one disk?


You nailed it. Stripe size is the minimum size of a write to a physical disk
in the array. Trying allocate evenly at the byte level to each disk would be
insane in terms of the effect on performance.


--
Nik Simpson


  #10  
Old November 28th 03, 04:22 PM
Jesper Monsted
external usenet poster
 
Posts: n/a
Default

"Nik Simpson" wrote in
:
Ok... "stripe size"... So in a RAID array of 5 disks with a stripe
size of 8k, if I submit a request to the RAID controller to write
5,000 bytes, these bytes will not be scattered equally across all
drives? Since the size of the data being written is less than the
stripe size, all of the data could conceivably written to one disk?


You nailed it. Stripe size is the minimum size of a write to a
physical disk in the array. Trying allocate evenly at the byte level
to each disk would be insane in terms of the effect on performance.


Unless you're using RAID3 where the stripe size is basically one bit

--
/Jesper Monsted
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
15K rpm SCSI-disk Ronny Mandal General 26 December 8th 04 09:04 PM
Newbie Question re hardware vs software RAID Gilgamesh General 44 November 22nd 04 11:52 PM
asus p2b-ds and scsi (from a scsi newbie) [email protected] Asus Motherboards 8 May 30th 04 09:43 AM
120 gb is the Largest hard drive I can put in my 4550? David H. Lipman Dell Computers 65 December 11th 03 02:51 PM
newb questions about SCSI hard drives fred.do Homebuilt PC's 7 June 26th 03 01:59 AM


All times are GMT +1. The time now is 03:22 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.