If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
EMC Clariion w/ SATA disks purchase advice please
Hi,
my company is looking at using Clariion with SATA disks for research quality storage. This means we want cheap, but still good performance. The nature of the application is to do sequential reads of many average size 3GB flat files, say 1TB at a time. So cache on the controller is useless. The raw disk throughput is more important. Write speed is important, but not as important as read. There are 2 things I am concerned with this Clariion/SATA solution... 1. Clariion's SATA drives are 5400rpm, compares to Western Digital 7200rpm. 2. The Clariion's internal signal is FC, so the SATA disks attaches to a "translator" (Can someone explain this), and this further reduces the SATA's native speed. Can someone who has experience help me out? Any story of your setup and experience are deeply appreciated. Clayton |
#2
|
|||
|
|||
- C - wrote:
Hi, my company is looking at using Clariion with SATA disks for research quality storage. This means we want cheap, but still good performance. The nature of the application is to do sequential reads of many average size 3GB flat files, say 1TB at a time. So cache on the controller is useless. Not necessarily, if the OS/HW combination can't issue read requests fast enough to saturate the raw bandwidth, then read ahead into the controller cache may improve things. It'll depend a bit on the data access patterns and block sizes, but don't rule out the possible benefits of cache even for sequential reads. BTW, it might be worth stating the required read performance. The raw disk throughput is more important. Write speed is important, but not as important as read. There are 2 things I am concerned with this Clariion/SATA solution... 1. Clariion's SATA drives are 5400rpm, compares to Western Digital 7200rpm. Or even 10K for the WD 36GB. But generally, rotational speed is an important factor for random IOPs, it has less impact on performance for sequential I/O. 2. The Clariion's internal signal is FC, so the SATA disks attaches to a "translator" (Can someone explain this), and this further reduces the SATA's native speed. A controller that offers an external I/F that is different to the drives native I/F (i.e. FC vs. SATA) is nothing new, it's been done with SCSI, SSA and PATA in the past. Put simply, the incoming I/O request is recieved by the controller which converts it to the necessary SATA disk I/O request(s) and passes it on the disk(s). When the data from the disk is recieved by the controller it re-packages it as an FC I/O and sends the data back to the host. Given that the FC I/F is faster than the SATA I/F its unlikely that the FC I/F will be a significant bottleneck. Can someone who has experience help me out? Any story of your setup and experience are deeply appreciated. As I said earlier, it would be helpful if you could state your performance requirements in terms of MB/s as that would help determine whether the solution workable. BTW are you talking about the hi-end (CX-200 etc) Clariions that allow you to use SATA drives in add-on chassis to a first chassis that contains FC drives or the newly announced AX prduct that is a pure FC-SATA solution. -- Nik Simpson |
#3
|
|||
|
|||
Hi Nik,
thanks for the reply. The plan is to attache a few linux boxes to a Clariion cx700 via 2GB HBA cards. We have large flat files that totals 12TB. The programs will fire off on the linux boxes and sequentially read a few TBs each. The apps will read as fast as the data can come in, so the read requirement is really as fast as the spindles can move. 10TB at 40MB/s will be 29days, at 80MB/s will be 15days, so faster the better... Although we don't have the $$ to go FC... I mentioned the WD 7200rpm 250gb drive because the comparable drive size in Clariion. To buy 10krpm 36gb drive will be too costly for me... I read on another thread someone mentioned the translation being a bottleneck, have you seen this at all? Thanks! Clayton "Nik Simpson" wrote in message .. . - C - wrote: Hi, my company is looking at using Clariion with SATA disks for research quality storage. This means we want cheap, but still good performance. The nature of the application is to do sequential reads of many average size 3GB flat files, say 1TB at a time. So cache on the controller is useless. Not necessarily, if the OS/HW combination can't issue read requests fast enough to saturate the raw bandwidth, then read ahead into the controller cache may improve things. It'll depend a bit on the data access patterns and block sizes, but don't rule out the possible benefits of cache even for sequential reads. BTW, it might be worth stating the required read performance. The raw disk throughput is more important. Write speed is important, but not as important as read. There are 2 things I am concerned with this Clariion/SATA solution... 1. Clariion's SATA drives are 5400rpm, compares to Western Digital 7200rpm. Or even 10K for the WD 36GB. But generally, rotational speed is an important factor for random IOPs, it has less impact on performance for sequential I/O. 2. The Clariion's internal signal is FC, so the SATA disks attaches to a "translator" (Can someone explain this), and this further reduces the SATA's native speed. A controller that offers an external I/F that is different to the drives native I/F (i.e. FC vs. SATA) is nothing new, it's been done with SCSI, SSA and PATA in the past. Put simply, the incoming I/O request is recieved by the controller which converts it to the necessary SATA disk I/O request(s) and passes it on the disk(s). When the data from the disk is recieved by the controller it re-packages it as an FC I/O and sends the data back to the host. Given that the FC I/F is faster than the SATA I/F its unlikely that the FC I/F will be a significant bottleneck. Can someone who has experience help me out? Any story of your setup and experience are deeply appreciated. As I said earlier, it would be helpful if you could state your performance requirements in terms of MB/s as that would help determine whether the solution workable. BTW are you talking about the hi-end (CX-200 etc) Clariions that allow you to use SATA drives in add-on chassis to a first chassis that contains FC drives or the newly announced AX prduct that is a pure FC-SATA solution. -- Nik Simpson |
#4
|
|||
|
|||
- C - wrote:
Hi Nik, thanks for the reply. The plan is to attache a few linux boxes to a Clariion cx700 via 2GB HBA cards. We have large flat files that totals 12TB. How will that be distributed, i.e. will there be one filesystem shared between all the LINUX boxes or will there be seperate LUNs carved out for each of the LINUX boxes? BTW, I think you'll find cache much more helpful in this scenario than you might think because of the need to support multiple hosts reading from different parts of the array, what looks like sequential access to each host may well end up looking far from sequential to the array. The programs will fire off on the linux boxes and sequentially read a few TBs each. The apps will read as fast as the data can come in, so the read requirement is really as fast as the spindles can move. 10TB at 40MB/s will be 29days, at 80MB/s will be 15days, so faster the better... Although we don't have the $$ to go FC... Have you looked at the new Clariion AX-100 announced this week, this is a pure FC-SATA box and it also supports the 7200RPM drives. The SATA implementation in the CX-700 requires the controller head & set of FC drives and its much more expensive. You could probably afford to spread the load over several AX100s each with dual controllers for the same amount you'd pay for the CX-700 solution which is probably overkill for what you want. Raw throughput on the AX100 fully populated is rated at 300MB/s for large block sequential read, now even if you only see 1/2 of that combined with the need for four of the arrays (max 3TB for each AX100) you'll able to get 600MB/s overall which isn't bad. Take a look at: http://www.emc.com/products/systems/...a_spec_ldv.pdf Just for the record, I don't work for EMC and have no "axe to grind" I mentioned the WD 7200rpm 250gb drive because the comparable drive size in Clariion. To buy 10krpm 36gb drive will be too costly for me... I read on another thread someone mentioned the translation being a bottleneck, have you seen this at all? I think the bottleneck that people refer to is more in comparison to a pure FC configuration of the CX-700, either way I don't think the overhead of the FC-SATA translation is going to be a big problem in this application. But I do urge you to take a look at the AX if what you want is an FC-SATA solution. BTW if you want all the LINUX boxes to share the same array (or storage pool) and you can't afford pure FC, you really don't have much of an option other than FC-SATA or FC-SCSI so the overhead involved in the translation is moot. |
#5
|
|||
|
|||
Please let us know your experiences with this system.
We are using a CX-400 with both FC-SCSI and S-ATA disks. The S-ATA disks (10 disks in a RAID group) have a throughput of 4 MB/sec max. Imagine that. I have demanded an investigatin from EMC, who has setup this thing. They just say that these disks are really slow. But hey, according to the specs they should reach about 35 up to 60 MB/sec per disk. Something is totally misconfigured here and I have no help from EMC and no clue about the reason. Bye Rick Denoire |
#6
|
|||
|
|||
"Rick Denoire" wrote in message ... Please let us know your experiences with this system. We are using a CX-400 with both FC-SCSI and S-ATA disks. The S-ATA disks (10 disks in a RAID group) have a throughput of 4 MB/sec max. Imagine that. I have demanded an investigatin from EMC, who has setup this thing. They just say that these disks are really slow. There certainly used to be people at EMC far more competent than such an answer indicates. If you're talking with them, they're lying to you; if not, find someone competent to talk with. But hey, according to the specs they should reach about 35 up to 60 MB/sec per disk. Something is totally misconfigured here and I have no help from EMC and no clue about the reason. It used to be that the combination of a disabled disk write-back cache plus a limitation to smallish request sizes (sometimes as small as 32 KB per request, depending upon the assumptions the controller made about disk behavior, since some parallel ATA disks got flakey at larger transfer sizes - see the Linux ATA driver source code for examples) could cause each request to miss a complete disk revolution. With a 7200 rpm disk, at 8+ ms./rev, that works out to just about 4 MB/sec. But even PATA request size limits got increased (and unnecessary vendor-specific problems in this area fixed), and SATA request size limits are at least in the multi-MB range now IIRC. So there's no excuse for a properly-configured controller not to achieve something close to the max theoretical disk bandwidth (which should be in the ballpark you stated). Now, it's also possible that you're experiencing a pathological embrace between your operating system and your RAID controller - at least if you're using RAID-5. For example, the Windows NT/2K/XP cache (I don't remember if you specified what OS you're using) usually writes back data in 64 KB chunks. If the controller sees such a write, and isn't aware that it can be performed lazily (and thus potentially coalesced with adjacent writes into a full-stripe transfer), and the OS also isn't performing the writes lazily such that it could submit additional 64 KB writes while waiting for the first to complete, it may well take 2 full disk revolutions before the next write request can be accepted (again, about 4 MB/sec with 7200 rpm disks). - bill |
#7
|
|||
|
|||
Hey Bill,
You seem to know what you're talking about. We have a CX600 Clariion with 40 disks configured as RAID 5 (5-disks/raid group). We then further stripe over 4 raid groups per SP. From this we offer to the hosts 2 "devices" (each device being striped over 4 raid groups). On the host we then stripe over both these devices. Our application does a read/write of 4MB. On the host our stripe size is set to 2 MB (thus each request will go to both devices). The stripe set on the SP is 512KB (thus the 2MB request is split over all 4 raid groups). Then within the raid group we stripe on 128KB (thus the 512KB request is now split over 4 disks and disk 5 for the parity). In other words we have all 40 disks (for writes) and 32 disks (for reads) working for us. Caching is fully enabled. We notice that once the Clariion needs to start flushing to disk since the cache is full (happens very quickly) our throughput is down to 40MB/s. This means we're doing about 1.25MB/s / disk. We were told that this is due to a bottleneck in the translation from FC to ATA and that nothing will resolve this unless we switch to FC disks. What are your thoughts on this? Thanks. "Bill Todd" wrote in message ... "Rick Denoire" wrote in message ... Please let us know your experiences with this system. We are using a CX-400 with both FC-SCSI and S-ATA disks. The S-ATA disks (10 disks in a RAID group) have a throughput of 4 MB/sec max. Imagine that. I have demanded an investigatin from EMC, who has setup this thing. They just say that these disks are really slow. There certainly used to be people at EMC far more competent than such an answer indicates. If you're talking with them, they're lying to you; if not, find someone competent to talk with. But hey, according to the specs they should reach about 35 up to 60 MB/sec per disk. Something is totally misconfigured here and I have no help from EMC and no clue about the reason. It used to be that the combination of a disabled disk write-back cache plus a limitation to smallish request sizes (sometimes as small as 32 KB per request, depending upon the assumptions the controller made about disk behavior, since some parallel ATA disks got flakey at larger transfer sizes - see the Linux ATA driver source code for examples) could cause each request to miss a complete disk revolution. With a 7200 rpm disk, at 8+ ms./rev, that works out to just about 4 MB/sec. But even PATA request size limits got increased (and unnecessary vendor-specific problems in this area fixed), and SATA request size limits are at least in the multi-MB range now IIRC. So there's no excuse for a properly-configured controller not to achieve something close to the max theoretical disk bandwidth (which should be in the ballpark you stated). Now, it's also possible that you're experiencing a pathological embrace between your operating system and your RAID controller - at least if you're using RAID-5. For example, the Windows NT/2K/XP cache (I don't remember if you specified what OS you're using) usually writes back data in 64 KB chunks. If the controller sees such a write, and isn't aware that it can be performed lazily (and thus potentially coalesced with adjacent writes into a full-stripe transfer), and the OS also isn't performing the writes lazily such that it could submit additional 64 KB writes while waiting for the first to complete, it may well take 2 full disk revolutions before the next write request can be accepted (again, about 4 MB/sec with 7200 rpm disks). - bill |
#8
|
|||
|
|||
"Erik Hendrix" wrote in message s.com... Hey Bill, You seem to know what you're talking about. Alas, only in a general sense. I have no direct acquaintance with the Clariion arrays. We have a CX600 Clariion with 40 disks configured as RAID 5 (5-disks/raid group). We then further stripe over 4 raid groups per SP. From this we offer to the hosts 2 "devices" (each device being striped over 4 raid groups). On the host we then stripe over both these devices. Our application does a read/write of 4MB. On the host our stripe size is set to 2 MB (thus each request will go to both devices). The stripe set on the SP is 512KB (thus the 2MB request is split over all 4 raid groups). Then within the raid group we stripe on 128KB (thus the 512KB request is now split over 4 disks and disk 5 for the parity). Unless I need sleep more than I'm aware of right now, that all sounds good. In other words we have all 40 disks (for writes) and 32 disks (for reads) working for us. Caching is fully enabled. We notice that once the Clariion needs to start flushing to disk since the cache is full (happens very quickly) our throughput is down to 40MB/s. This means we're doing about 1.25MB/s / disk. We were told that this is due to a bottleneck in the translation from FC to ATA and that nothing will resolve this unless we switch to FC disks. What are your thoughts on this? I once again lean toward incompetence as an explanation: either the people who told you this were incompetent, or the people who implemented that FC-to-ATA translation were (there's no reason it should be anything like that slow, unless they did something incredibly dumb - or intentionally so, in order to push their higher-end solutions - and simply used their tagged-queuing SCSI algorithms unchanged with the SATA disks resulting in strictly serial/synchronous operation). - bill |
#9
|
|||
|
|||
"Erik Hendrix" wrote:
You seem to know what you're talking about. We have a CX600 Clariion with 40 disks configured as RAID 5 (5-disks/raid group). We then further stripe over 4 raid groups per SP. From this we offer to the hosts 2 "devices" (each device being striped over 4 raid groups). On the host we then stripe over both these devices. Our application does a read/write of 4MB. On the host our stripe size is set to 2 MB (thus each request will go to both devices). The stripe set on the SP is 512KB (thus the 2MB request is split over all 4 raid groups). Then within the raid group we stripe on 128KB (thus the 512KB request is now split over 4 disks and disk 5 for the parity). In other words we have all 40 disks (for writes) and 32 disks (for reads) working for us. Caching is fully enabled. We notice that once the Clariion needs to start flushing to disk since the cache is full (happens very quickly) our throughput is down to 40MB/s. This means we're doing about 1.25MB/s / disk. We were told that this is due to a bottleneck in the translation from FC to ATA and that nothing will resolve this unless we switch to FC disks. I wonder if you ever really corroborated this setup. I would not rely on that kind of calculations alone. You have to somehow "see" what is happening. RAID devices nowadays are said to have their own idiosyncracies. For example, I was told to use RAID 5 instead of mirroring because supposedly the cx-400 is internally specifically optimized for RAID 5. So it would do something based on a RAID 5 logic even if the RAID setup is different. Weird. Bye Rick Denoire |
#10
|
|||
|
|||
"Bill Todd" wrote:
Now, it's also possible that you're experiencing a pathological embrace between your operating system and your RAID controller - at least if you're using RAID-5. For example, the Windows NT/2K/XP cache (I don't remember if you specified what OS you're using) usually writes back data in 64 KB chunks. If the controller sees such a write, and isn't aware that it can be performed lazily (and thus potentially coalesced with adjacent writes into a full-stripe transfer), and the OS also isn't performing the writes lazily such that it could submit additional 64 KB writes while waiting for the first to complete, it may well take 2 full disk revolutions before the next write request can be accepted (again, about 4 MB/sec with 7200 rpm disks). Well, since the disks are at the central SAN storage, to which several different hosts are attached via FC, I am able to do interesting experiments. I can deassign one LUN from one Sun/Solaris host and reassign it to an Intel/Linux host or to another Sun/Solaris machine. In ALL cases, performance was very poor. Even when one LUN was internally mirrored (no host involved), I was shocked how slow that was. I am planning some experiments next time in order to demand to action from Dell/EMC. I have no clue about how to switch the harddisk own cache on/off. My question is, where do I have a chance to improve something? At the driver level by changing the settings in the driver configuration file? Using the Navisphere or navicli and changing the element size or read ahead feature or ..? And as mentioned somewhere else, perhaps by switching the harddisk's own cache on?? Bye Rick Denoire |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ready for Raid....8KNXP ver2.0 | M-Tech | Gigabyte Motherboards | 8 | May 26th 04 01:12 AM |
Intel 875 Mobo and RAID. Is this rightso far? | K G Wood | Homebuilt PC's | 7 | April 19th 04 06:17 AM |
Combined RAID and non-RAID array? | PghCardio | Homebuilt PC's | 3 | October 3rd 03 03:27 AM |
Helpful purchase advice needed on a PC build project | Davek | Homebuilt PC's | 11 | August 17th 03 02:57 AM |
Laptop Purchase Advice | Gene Brabston | General | 3 | August 9th 03 10:47 PM |