If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
Hey There
On Sun, 21 Nov 2004 22:03:53 -0800, JB Orca wrote: I am trying to find out how to do something that does not make much sense to me. I have spoken to a few people who say they have taken multiple servers (1u in this case) and striped the drives on those 1u servers so that all servers in the group were seeing all data on all servers. Does this make sense to anyone? To put it in context, this came up during a conversation over the merits of NAS(nfs) vs SAN. The above idea was given to me as a 'cheaper' solution. I'm not really sure the 'cheaper' solution is what I want, but I was intruqued about how this is possible. Does anyone know how to do this or what this is called? It would seem it might be a 'cluster file system' but I see nothing like that when performing the usual Google searches. And...for the record, this was on Linux and possibly a *bsd. You could use http://www.lustre.org/ Its open source its suppose to be fast and stable and you can use commodity hardware which seems to be one of your main preferences Thomas Kirk |
#22
|
|||
|
|||
On Wed, 24 Nov 2004 09:52:42 GMT, "Mr.X" wrote:
Hey There On Sun, 21 Nov 2004 22:03:53 -0800, JB Orca wrote: I am trying to find out how to do something that does not make much sense to me. I have spoken to a few people who say they have taken multiple servers (1u in this case) and striped the drives on those 1u servers so that all servers in the group were seeing all data on all servers. Does this make sense to anyone? To put it in context, this came up during a conversation over the merits of NAS(nfs) vs SAN. The above idea was given to me as a 'cheaper' solution. I'm not really sure the 'cheaper' solution is what I want, but I was intruqued about how this is possible. Does anyone know how to do this or what this is called? It would seem it might be a 'cluster file system' but I see nothing like that when performing the usual Google searches. And...for the record, this was on Linux and possibly a *bsd. You could use http://www.lustre.org/ Its open source its suppose to be fast and stable and you can use commodity hardware which seems to be one of your main preferences Thomas Kirk If you ever talk to anyone who's installed it it generally takes 3 PHD's to install and configure this thing. It's still not ready for prime time and requires alot of kernel hacks for bug fixes, sometimes 4 a week. Also, the write performance on this product blows. If you need insane read throughput they're a good solution, once you get it configured. But for any real write requirements just forget it. ~F |
#23
|
|||
|
|||
JB Orca wrote in message news:2004112312494650073%jborca@gmailcom...
On 2004-11-23 12:34:03 -0500, Arne Joris said: JB Orca wrote: ... I have a system that will need to start with roughly 5 terabytes of storage space. It will very quickly grow to needing anywhere from 50-100 terabytes. With these kinds of numbers, you'll have lot of drives and thus drive failures will become quite common. Are you looking at using some RAID configuration to overcome this ? Yes, I think that would be needed. The idea was perhaps to do 6 drive boxes with RAID 5 for each server. The problem we are attempting to solve is this: what is the best option for the storage in this system? The original thought, before we realized how big it was going to get, was just a large RAID direct attach system. Then we thought about NAS or SAN, however, when I heard the talk of spanning storage space across multiple servers this seemed as though it might also be a good option. This would be using your LAN to move data unless the server doing the I/O happens to have the target disk locally available, right ? I guess with gigabit ethernet this might not be such a problem anymore, except for processor overhead. A SAN will allow every server to use Fibre Channel to move the data, your LAN and server cpus won't be loaded nearly as much. Depending on your application load, you could save a lot on LAN switches and servers by spending more on a SAN. It seems that SANS are more geared towards allowing multiple servers to use a 'shared storage' of sorts, but that the 'shared storage' is partitioned for each individual server accessing it. Is that the case? I'm in need of more of a NAS type of solution where there is one large pooled storage area that all servers can access. They will all need access to the same files. NAS Box basically a server attach to a RAID box. (Fibre Channel, SCSI, IDE, ATA etc..) where you access your storage through that sever in your NAS box. The problem with NAS is the storage server in the NAS become your bottle neck. Also you will run in to limitation once you start to grow your storage. If go with pure SAN you can add stroage independently to the server. For what you need you could use server cluster to access the SAN and provide service to your clinet. Have to thought about back up or mirror? ... The idea of a 'RAID' of servers seems fantastic. If I can use the storage on 5 servers and stripe the data across them that would be great, however, I have noticed with some of the options that in order to add a new server the entire system needs to be taken down and re-configured and brought back up. The only reason to go this way instead of a regular SAN would be cost I guess; by using plain old scsi drives you'll cut down the cost significantly. But again my first question, do you plan on using some form of RAID (software, raid controller, raid enclosure,...) ? If you just plug in a bunch of scsi drives into a bunch of servers and start storing data on then, at a hundred terabytes worth of disks, you'll be running around shutting down hosts in order to swap out disks all day in my opinion. Arne Joris Ah. Good point. But, no matter which route I go I'm going to end up with numerous drives, so I think this problem will exist no matter what, right? That true you will have numerous drive with either option. If your data store across many server you also run it to problem of system HW failure and not just HD any more. Once you start to have problem with your storage on those server you will need to first identify if the server is the probem or the drive. Then how to keep all those server OS and antivirs uptodate. Good Luck, In the SAN configuration you would be just dealing with drive failure. Thanks! JB |
#24
|
|||
|
|||
Hey
On Wed, 24 Nov 2004 17:46:10 +0000, Faeandar wrote: If you ever talk to anyone who's installed it it generally takes 3 PHD's to install and configure this thing. It's still not ready for prime time and requires alot of kernel hacks for bug fixes, sometimes 4 a week. Ok i didnt knew that. One of my frieds recommended it to for a project im working on currently. I haven't tried it though. Also, the write performance on this product blows. If you need insane read throughput they're a good solution, once you get it configured. But for any real write requirements just forget it. The 1.2 branch which is productionready not free though is supposed to be better according to ClusterFS. /T |
#25
|
|||
|
|||
On Wed, 24 Nov 2004 21:23:14 +0100, "Mr. X"
wrote: Hey On Wed, 24 Nov 2004 17:46:10 +0000, Faeandar wrote: If you ever talk to anyone who's installed it it generally takes 3 PHD's to install and configure this thing. It's still not ready for prime time and requires alot of kernel hacks for bug fixes, sometimes 4 a week. Ok i didnt knew that. One of my frieds recommended it to for a project im working on currently. I haven't tried it though. Also, the write performance on this product blows. If you need insane read throughput they're a good solution, once you get it configured. But for any real write requirements just forget it. The 1.2 branch which is productionready not free though is supposed to be better according to ClusterFS. /T It's "production ready" for places like Lawrence Livermoore and Sandia National, not for chip companies or manufacturing or petroleum exploration. Talk to ClusterFS and what they sell is consulting and support, the product is open source so they can't charge for it. But they can charge for install/config and support which, if you plan to use it and don't have a few AstroPhysicists handy, you'll need. ~F |
#26
|
|||
|
|||
On Wed, 24 Nov 2004 21:15:02 +0000, Faeandar wrote:
It's "production ready" for places like Lawrence Livermoore and Sandia National, not for chip companies or manufacturing or petroleum exploration. Talk to ClusterFS and what they sell is consulting and support, the product is open source so they can't charge for it. But they can charge for install/config and support which, if you plan to use it and don't have a few AstroPhysicists handy, you'll need. Ok so you would recommend something like PolyServ which is more a plug and play solution for something like a webcluster. /T |
#27
|
|||
|
|||
On Wed, 24 Nov 2004 23:02:36 +0100, "Mr. X"
wrote: On Wed, 24 Nov 2004 21:15:02 +0000, Faeandar wrote: It's "production ready" for places like Lawrence Livermoore and Sandia National, not for chip companies or manufacturing or petroleum exploration. Talk to ClusterFS and what they sell is consulting and support, the product is open source so they can't charge for it. But they can charge for install/config and support which, if you plan to use it and don't have a few AstroPhysicists handy, you'll need. Ok so you would recommend something like PolyServ which is more a plug and play solution for something like a webcluster. /T I can't say I'd recommend them without knowing your specifics. But what I can say is I like their story and they perform as advertised. I think there are more solutions available if you are primarily read-performance concerned, but it's a smaller venue if write-performance is your game. ~F |
#28
|
|||
|
|||
On 2004-11-24 03:03:27 -0500, Faeandar said:
My first thought is you are over-architecting this. I don't see a reason to require multi-host write access for something like this. How many concurrent users and how much data is transferred? Do you really need more than one server for data movement? Perhaps a beefy server with a failover partner would suffice? ~F Yeah, it's totallyl possible that I am thinking about this way too much and making it more complicated than it needs to be. One thing that could be added is another level of servers between the web servers and the storage system that will do nothing but perform actions on the files in storage. Basically, a user could use the front-end to select a group of files and ask for an action to be performed on them. That action would be 'batched' (sort of) on the middle servers, so they would need write (and read) access to the same files as the front end web boxes. One of the other reasons for the original thinking was to be able to use cheaper machines in front. It's very easy to purchase additional lower-end servers to perform those tasks and scale by adding boxes as needed. Well, without knowing how many concurrent users and how much traffic there is you won't be able to architect a solution very well. You need to understand those pieces first. About the volume manager mention from earlier...I understand that is needed, the part I am confused about (with the SAN) is where is the Volume Manager installed? There is a 'head' for the SAN, correct? That head is nothing more than a server of sorts? That 'head' has the various arrays attached to it and the VM can control across those arrays? Is that close to what we are talking about? SAN was nothing more than fancy, and expensive, DAS (direct attached storage). Now clustered file systems are starting to differentiate it. But I digress.... A volume manager is host-based, so it lives on the individual hosts. It can do many things for the host but in this case what we're talking about is it's ability to take 2 distinct LUN's and bind them together to make a coherent single file system for the host. This is what allows you to take a LUN from array 1 and a LUN from array 2 and put them together to form a single file system for the host. Veritas is probably the most common 3rd party VM, maybe even for OS's that come with a VM built in. As long as you're not trying to use the VM to do any sort of raid (that's what the array is for) it puts negligible load on the host to manage the LUN's for writes/reads. Easy way to think of it is a layer between the OS and the LUN. The OS talks to the VM, and the VM talks to the LUN's. LUN's are nothing more than parts or wholes of physical drives in the array allocated to form a logical unit, hence the LU in LUN. For SAN's there is no "head", that's a NAS term. There are controller cards and storage ports. These are what are connected to hosts or switches. Fiber comes out of the host, into a switch, and from there into a storage port on the array. You can remove the switch if you want, makes no difference except in number of hosts you can hook up. Most arrays have 32 storage ports or less, whereas with switches you can fan out to alot more than that. ~F Great info. I really appreciate the help. My questions are sounding more 'newbie' as I go along. In terms of number of users, it would start small, probably around 50 concurrent users. However, within a few months that could grow a bit, but I would never expect more than 200 concurrent users. I am still grasping to understand this scenario: Let's say I have one server and one array. (And a switch, just to be solid.) That is an easy setup. Then I add a second array, this is still not that big a deal as I can use the VM to 'join' the arrays so the server is still seeing one filesystem. The problem (for my head anyway) is when I add a second server to the setup. Now I have 2 servers, 2 arrays and a switch. One server is already seeing one filesystem across both arrays, thanks to the VM, however, on the second server how does it see the same filesystem? Would I then just set up the VM on the second server to exactly what server 1 is doing? So the VM in server 2 is set up exactly like the VM on server 1? Or...does a 'cluster filesystem' need to be in place at this point? I now have a much better understanding of the SAN vs NAS idea, as well as the two of them vs standard direct attached storage. The above issue is the last piece that is still hurting my head. And, just for the sake of discussion, if raw speed between the disks and user are not my most important piece, does it make sense to look at a NAS device instead? Or, just a big honking NFS server using a VM that I can grow a RAID on? Also, for backups, my understanding is that it is much easier to backup a SAN vs a NAS due to the fact that with the SAN I can do a complete snapshot of the device rather than just the mountpoint and files with the NAS. Correct? I like the fact that with the SAN I can do away with the possible headaches of having an actual 'server' controllilng the storage, meaning I don't have the issue of an NFS (or SAN head) server going down and taking everything with it. So that is a strong point for sure. Ok...am I making any more sense now or am I confusing the matter more? Again...thanks MUCH for the assistance! JB |
#29
|
|||
|
|||
In terms of number of users, it would start small, probably around 50 concurrent users. However, within a few months that could grow a bit, but I would never expect more than 200 concurrent users. Doubtful you need alot of horsepower then. Most enterprise class servers can handle that number of users. I am still grasping to understand this scenario: Let's say I have one server and one array. (And a switch, just to be solid.) That is an easy setup. Then I add a second array, this is still not that big a deal as I can use the VM to 'join' the arrays so the server is still seeing one filesystem. The problem (for my head anyway) is when I add a second server to the setup. Now I have 2 servers, 2 arrays and a switch. One server is already seeing one filesystem across both arrays, thanks to the VM, however, on the second server how does it see the same filesystem? Would I then just set up the VM on the second server to exactly what server 1 is doing? So the VM in server 2 is set up exactly like the VM on server 1? Or...does a 'cluster filesystem' need to be in place at this point? Correct. This is where the cluster file system comes into play. If you are planning on sharing this data such that the SAN attached servers act as NFS servers for the data in question, by all means go NAS. That is what you'd be doing anyway. Two hosts cannot write to the same file system without some sort of software to allow write locks. For most CFS's there's a lock manager on the fc network that hands out locks to hosts. in some cases that lock metadata is distributed among all the nodes. This is a whole 'nother topic that deserves it's own thread. I now have a much better understanding of the SAN vs NAS idea, as well as the two of them vs standard direct attached storage. The above issue is the last piece that is still hurting my head. And, just for the sake of discussion, if raw speed between the disks and user are not my most important piece, does it make sense to look at a NAS device instead? Or, just a big honking NFS server using a VM that I can grow a RAID on? I am a big fan of NAS, after all NFS is the oldest running multi-write file system in the open systems world (some 20 years now I believe). And there are several NAS solutions that are faster than SAN solutions depending on the data traffic. If you simply want multi-user access to the same data, and uber speed is not paramount, then go with NAS. I like NetApp personally for performance and features but BlueArc has a good story as well. And are likely cheaper, NetApp can be fairly expensive but it's great stuff. No VM is needed for NAS because it has it's own built in. Also, for backups, my understanding is that it is much easier to backup a SAN vs a NAS due to the fact that with the SAN I can do a complete snapshot of the device rather than just the mountpoint and files with the NAS. Correct? SAN is easier to backup in almost all cases, but not because of snapshots. NAS snapshots are usually much better than SAN. Reason being that the NAS is the drives PLUS the file system. This means it can guarentee data integrity when it takes the snapshot because it also performs the write operations, and so suspends writes while it takes the snapshot. SAN cannot do this because it only controls the drives, the hosts control the file system. NAS backups are a pain because generally NDMP is used, which usualy requires additional licenses from your backup software vendor. It's a solid protocol, just not as easy to work with, it's very minimalist. I like the fact that with the SAN I can do away with the possible headaches of having an actual 'server' controllilng the storage, meaning I don't have the issue of an NFS (or SAN head) server going down and taking everything with it. So that is a strong point for sure. For the most part SAN arrays and networks are more stable but NAS can be very highly available if you build it right. It all depends on what you need. Are you willing to accept potentially more downtime for better management and ease of use? Dunno, business call. But a nice effect of NFS's statelessness is that it will keep trying a connection for up to 15 minutes before giving up and staling the mount. This is usually far more time than you need to get a NAS head back up. Most problems with NAS heads are panic reboots, which are usually 90 seconds or less, and OS upgrades, which are also reboots of 90 seconds or less. Ok...am I making any more sense now or am I confusing the matter more? nope, doing just fine. ~F |
#30
|
|||
|
|||
JB Orca writes:
It seems that SANS are more geared towards allowing multiple servers to use a 'shared storage' of sorts, but that the 'shared storage' is partitioned for each individual server accessing it. Is that the case? It depends on the file system you're using. If you have a file system which supports multiple writers, the storage needn't be partitioned. I'm in need of more of a NAS type of solution where there is one large pooled storage area that all servers can access. They will all need access to the same files. Why? It sounds like your data rates are going to be quite low (only 50-200 users?). Are you sure you can't get by with one server? Ah. Good point. But, no matter which route I go I'm going to end up with numerous drives, so I think this problem will exist no matter what, right? How much of your data is active on any given day? I work on a product from Sun, SAM-QFS, which is a filesystem integrated with an archiving system. It provides transparent movement of data between disk and tape (rather like a traditional HSM, though somewhat more flexible). Depending on the details of your environment, it might be reasonable to consider buying, say, 2-10 TB of disk, with a tape library as an archive to hold further growth. This makes a lot of sense if you're in an environment where you want to be able to store and quickly retrieve old projects, for instance, but only a few are active at any one time. In addition, it's relatively inexpensive to expand a tape library over time (assuming it was sized right in the first place) just by buying more tapes. This has the additional advantage of taking care of backup for you (simply tell the system to keep more than 1 copy on tape). Backing up 50 TB of data is not easy, especially if you can't be down for days during the restore process. Our product runs on Solaris, but there are some others available for other platforms. It really sounds like this might be a simpler solution to manage for you than 50 TB of disk storage. (Incidentally, SAM-QFS is also a shared file system, so you could use multiple servers with one SAN if desired. It supports up to 252 LUNs in one file system, which makes expansion by buying new disk arrays relatively easy. But if you don't already have Solaris in your environment, you might not want to add it, though we do have many customers who use Sun systems for their storage simply because of SAM.) -- Anton (honest, i'm not in marketing. just thinking that having 50 TB of disk probably isn't the right way to go if you're not accessing all of that data all of the time. individual disks are cheap, but reliable arrays are not, and the management of arrays gets expensive.) |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Enterprise Storage Management (ESM) FAQ Revision 2004/06/23 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | June 23rd 04 06:58 AM |
STORAGE SERVERS | Phil Jennings | Storage & Hardrives | 10 | May 8th 04 02:09 AM |
Enterprise Storage Management (ESM) FAQ Revision 2004/04/11 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | April 11th 04 07:24 AM |
Enterprise Storage Management (ESM) FAQ Revision 2004/02/16 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | February 16th 04 09:23 PM |
Terabyte Storage By Real-Storage | Real-Storage | Storage & Hardrives | 2 | October 23rd 03 04:18 PM |