If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
120 TB SAN solutions
Hello,
I am shopping for a 120 TB SAN solution which will probably be accessed by a cluster of servers through a shared filesystem. Being new in this field, I would welcome some pointers on where I should start looking. I have already find some handy info through archives of this list, but I wonder if there are any interesting recent iSCSI or ATA SAN solutions to look for. It think it would be appropriate that if you want to propose your company's solution to me , pls. do so by email instead of usenet. With kind regards, -- Leon Woestenberg Coredinal Realtime Systems |
#2
|
|||
|
|||
Leon Woestenberg wrote:
Hello, I am shopping for a 120 TB SAN solution which will probably be accessed by a cluster of servers through a shared filesystem. Being new in this field, I would welcome some pointers on where I should start looking. I have already find some handy info through archives of this list, but I wonder if there are any interesting recent iSCSI or ATA SAN solutions to look for. It think it would be appropriate that if you want to propose your company's solution to me , pls. do so by email instead of usenet. With kind regards, It would help to know: 1. More about the host OS requirements (i.e. which OS platforms does the solution have to support) 2. Application for the SAN, is it one application with different OS platforms, or different OS platforms supporting different applications. What I'm really driving at here is the need for a shared filesystem which will inevitably slow things down, limit the OS choices (portentially) and make things more complex. 3. Performance requirements, for example if high-performance is a requirement, then you can probably forget both iSCSI and ATA. -- Nik Simpson |
#3
|
|||
|
|||
"Nik Simpson" wrote in message ... Leon Woestenberg wrote: I am shopping for a 120 TB SAN solution which will probably be accessed by a cluster of servers through a shared filesystem. 1. More about the host OS requirements (i.e. which OS platforms does the solution have to support) The SAN is written to and read from through a cluster of, say N, Linux servers. The cluster processes 200 datastreams coming in a steady 1 Mbit/second each, of which the results (also about 1 Mbit/second) are stored. As a result of processing, some very low bitrate metadata about the streams is entered into a database, which is stored on the SAN as well. We need redundancy throughout, i.e. no single point of failure. The cluster servers will have to fail-over their processing applications. Every data stream is received by two servers out of the cluster, where the secondary acts as a hot spare, processing and storing the data stream if the primary server fails to do so. The hardest part (IMHO) will be to make sure, that in case of network or server failure, the secondary will notice exactly where the primary server was in the data stream and take over from there. 2. Application for the SAN, is it one application with different OS platforms, or different OS platforms supporting different applications. What I'm really driving at here is the need for a shared filesystem which will inevitably slow things down, limit the OS choices (portentially) and make things more complex. I though that having a shared filesystem will limit the complexicity by offering a distributed locking mechanism, which is used on the cluster to manage the application I mentioned above? Anyway, we envision that all access to the storage (from different platforms) goes through the cluster. I.e. the cluster provides a NFS mount and CIFS share to a range of platforms. 3. Performance requirements, for example if high-performance is a requirement, then you can probably forget both iSCSI and ATA. From what I read about SANs, we are asking a low level of performance. Leon. |
#4
|
|||
|
|||
"Leon Woestenberg" writes:
"Nik Simpson" wrote in message . .. Leon Woestenberg wrote: I am shopping for a 120 TB SAN solution which will probably be accessed by a cluster of servers through a shared filesystem. 1. More about the host OS requirements (i.e. which OS platforms does the solution have to support) The SAN is written to and read from through a cluster of, say N, Linux servers. The cluster processes 200 datastreams coming in a steady 1 Mbit/second each, of which the results (also about 1 Mbit/second) are stored. As a result of processing, some very low bitrate metadata about the streams is entered into a database, which is stored on the SAN as well. We need redundancy throughout, i.e. no single point of failure. The cluster servers will have to fail-over their processing applications. Every data stream is received by two servers out of the cluster, where the secondary acts as a hot spare, processing and storing the data stream if the primary server fails to do so. The hardest part (IMHO) will be to make sure, that in case of network or server failure, the secondary will notice exactly where the primary server was in the data stream and take over from there. Doesn't sound like you need a SAN. Maybe some sort of NAS, which could be on IDE due to the very low performance requirements. But its probably not worth it to buy an FC card for each Linux box and the necessary FC switches, etc. when you would be using about .1% of the bandwidth of an FC link. Depending on the value of "N" for "N Linux boxes" and where the 200 data streams are coming from, maybe writing it to local disk on a couple machines for redundancy is an option. With 250GB IDE drives easily available, it is easy to stick 2TB (maybe 1.5TB with RAID5 + hot spare) in a fairly small box these days. -- Douglas Siebert "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." -- Mark Twain |
#5
|
|||
|
|||
I agree w/Doug here. A good IP network and NFS sounds like it would do the
trick. Netapp has a very good solution in the R150 if you want to integrate the NFS server function, and their NFS server is extremely solid... Otherwise, NexSAN and LSI have very good IDE disk solutions, but you have to worry about load balancing across NFS servers yourself. Depending on how comfortable you are with large filesystems on Linux, that be preferable. Make sure you spec out a backup system with this system (if you need one). --paul "Leon Woestenberg" wrote in message ... "Nik Simpson" wrote in message ... Leon Woestenberg wrote: I am shopping for a 120 TB SAN solution which will probably be accessed by a cluster of servers through a shared filesystem. 1. More about the host OS requirements (i.e. which OS platforms does the solution have to support) The SAN is written to and read from through a cluster of, say N, Linux servers. The cluster processes 200 datastreams coming in a steady 1 Mbit/second each, of which the results (also about 1 Mbit/second) are stored. As a result of processing, some very low bitrate metadata about the streams is entered into a database, which is stored on the SAN as well. We need redundancy throughout, i.e. no single point of failure. The cluster servers will have to fail-over their processing applications. Every data stream is received by two servers out of the cluster, where the secondary acts as a hot spare, processing and storing the data stream if the primary server fails to do so. The hardest part (IMHO) will be to make sure, that in case of network or server failure, the secondary will notice exactly where the primary server was in the data stream and take over from there. 2. Application for the SAN, is it one application with different OS platforms, or different OS platforms supporting different applications. What I'm really driving at here is the need for a shared filesystem which will inevitably slow things down, limit the OS choices (portentially) and make things more complex. I though that having a shared filesystem will limit the complexicity by offering a distributed locking mechanism, which is used on the cluster to manage the application I mentioned above? Anyway, we envision that all access to the storage (from different platforms) goes through the cluster. I.e. the cluster provides a NFS mount and CIFS share to a range of platforms. 3. Performance requirements, for example if high-performance is a requirement, then you can probably forget both iSCSI and ATA. From what I read about SANs, we are asking a low level of performance. Leon. |
#6
|
|||
|
|||
"Leon Woestenberg" wrote in message ... "Nik Simpson" wrote in message ... Leon Woestenberg wrote: .... The SAN is written to and read from through a cluster of, say N, Linux servers. The cluster processes 200 datastreams coming in a steady 1 Mbit/second each, of which the results (also about 1 Mbit/second) are stored. As a result of processing, some very low bitrate metadata about the streams is entered into a database, which is stored on the SAN as well. So far (25 MB/sec in, 25 MB/sec out plus a bit of metadata) that doesn't sound beyond the capacity of a single not even very muscular commodity IA32 box to handle - unless the required processing is prodigious (or you need TCP/IP offload help and don't have it). We need redundancy throughout, i.e. no single point of failure. The cluster servers will have to fail-over their processing applications. That would seem to be a possible sticking point for the kind of NAS solution Doug proposed: the NAS box itself is a single point of failure, unless it employs strict synchronous replication to a partner box (I think that, e.g., NetApp has this now) which could take over close to instantaneously on the primary NAS box's failure. Even many 'SAN file systems' may have a single point of failure in their central metadata server unless it's implemented with fail-over capability (GFS is a rare exception, having distributed metadata management). Every data stream is received by two servers out of the cluster, where the secondary acts as a hot spare, processing and storing the data stream if the primary server fails to do so. This should make possible 'N+1' redundancy, where you really only need about one extra server (beyond what's required just to handle the overall load): when a server fails, processing of its multiple streams is divided fairly equally among the survivors. Of course, given that the overall system shouldn't be all that expensive anyway, the added complexity of such an optimization may not be warranted. If you simply paired each server with an otherwise redundant partner, you could not only avoid such a load-division strategy but potentially could dispense with SAN and NAS entirely and just replicate the entire operation (data receipt, data processing, data storage on server-attached disks). If nothing fails, use either server's copy; if a server fails, just have its partner continue doing what it was doing all along. The hardest part (IMHO) will be to make sure, that in case of network or server failure, the secondary will notice exactly where the primary server was in the data stream and take over from there. It really shouldn't matter exactly where the primary was when it failed: what matters is how much of the output stream it had written to disk, and the secondary should just be able to interrogate that file (assuming it's something like a file) to find out. Unless (as described above) the secondary was already performing its own redundant processing and storage, in which case it doesn't even have to notice that the primary died (though whoever eventually needs the data needs to know which copy to use). In such a case, it might be desirable to incorporate a mechanism whereby a spare server could be pressed into service, 'catch up' to the current state of the remaining partner's disk state (possibly taking over the original - failed - primary server's disks to help in this), and then become the new secondary - just in case over time the original partner decided to fail too (but *any* storage system you use should have similar provisions to restore redundancy after a failure). However, at an overall storage rate of over 2 TB daily you're going to need far more disks to reach your 120 TB total than you need servers to do the processing (assuming a maximum of, say, eight 250 GB disks per server, you'd need about 60 servers just to hold the disk space required for a *single* copy of the data, but only a handful of servers to handle the processing). This makes your original proposal for somethign like a shared-SAN file system more understandable: not for reasons of concurrent sharing, but simply for bulk storage. However, if all you need is bulk, you can obtain that by carving out a private portion of the SAN storage for each server pair, with no need for a SAN file system at all (think of each such private chunk as a single file system that fails over to the partner on primary failure - though since the partner is already active it will need to understand how to perform a virtual file system restart to pick up the current state and move on from there). Or, if you don't need fast access to older data you've acquired, you could just use attached server disks as staging areas for relatively quick dumps to tape. This reverts to the idea of having paired servers that accumulate data to attached disks: if the primary dies, the secondary picks up the process of dumping to tape (restarting from the last tape started, perhaps, to avoid the 'exactly were were we?' issue you noted). The mechanics of your bulk storage and how it's used *after* you've gathered the data seem more important here than the mechanics of gathering and processing it. 2. Application for the SAN, is it one application with different OS platforms, or different OS platforms supporting different applications. What I'm really driving at here is the need for a shared filesystem which will inevitably slow things down, limit the OS choices (portentially) and make things more complex. I though that having a shared filesystem will limit the complexicity by offering a distributed locking mechanism, which is used on the cluster to manage the application I mentioned above? Possibly, but this particular application doesn't actually *share* any of its data among processors (at least not concurrently, in the shared-update m anner that distributed locking typically facilitates), so something considerably simpler might suffice. - bill |
#7
|
|||
|
|||
Paul Galjan wrote:
I agree w/Doug here. A good IP network and NFS sounds like it would do the trick. Netapp has a very good solution in the R150 if you want to integrate the NFS server function, and their NFS server is extremely solid... Otherwise, NexSAN and LSI have very good IDE disk solutions, but you have to worry about load balancing across NFS servers yourself. Depending on how comfortable you are with large filesystems on Linux, that be preferable. Make sure you spec out a backup system with this system (if you need one). Agreed, the performance requirements don't merit the expense of a SAN, and the need for a shread file system makes NAS much more suitable for the task. -- Nik Simpson |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cooling solutions | Steve | Ati Videocards | 5 | July 28th 04 12:31 AM |
Channel Partner Solutions Motherboard | Spiderman | Dell Computers | 2 | February 14th 04 02:34 AM |
Internet Song Database Problems and Solutions | MS | Cdr | 1 | December 26th 03 07:16 PM |
@@ For SALE 2 x 64 MB Laptop Memory, TV Adapter, Solutions Guide @@ | Tuncel Sunar | Acer Computers | 1 | December 9th 03 04:00 AM |
Nvidia wdm "code 10" problem: possible solutions | jigc | Nvidia Videocards | 0 | September 28th 03 08:50 AM |