Storage across multiple servers?

#21 November 24th 04, 09:52 AM

Hey There

On Sun, 21 Nov 2004 22:03:53 -0800, JB Orca wrote:

I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.

Does this make sense to anyone?

To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.

The above idea was given to me as a 'cheaper' solution.

I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.

Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.

And...for the record, this was on Linux and possibly a *bsd.

You could use

http://www.lustre.org/

Its open source its suppose to be fast and stable and you can use
commodity hardware which seems to be one of your main preferences

Thomas Kirk

#22 November 24th 04, 05:46 PM

On Wed, 24 Nov 2004 09:52:42 GMT, "Mr.X" wrote:

Hey There

On Sun, 21 Nov 2004 22:03:53 -0800, JB Orca wrote:

I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.

Does this make sense to anyone?

To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.

The above idea was given to me as a 'cheaper' solution.

I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.

Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.

And...for the record, this was on Linux and possibly a *bsd.

You could use

http://www.lustre.org/

Its open source its suppose to be fast and stable and you can use
commodity hardware which seems to be one of your main preferences

Thomas Kirk

If you ever talk to anyone who's installed it it generally takes 3
PHD's to install and configure this thing. It's still not ready for
prime time and requires alot of kernel hacks for bug fixes, sometimes
4 a week.

Also, the write performance on this product blows. If you need insane
read throughput they're a good solution, once you get it configured.
But for any real write requirements just forget it.

~F

#23 November 24th 04, 06:17 PM

JB Orca wrote in message news:2004112312494650073%jborca@gmailcom...
On 2004-11-23 12:34:03 -0500, Arne Joris said:

JB Orca wrote:
...
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.

With these kinds of numbers, you'll have lot of drives and thus drive
failures will become quite common. Are you looking at using some RAID
configuration to overcome this ?

Yes, I think that would be needed. The idea was perhaps to do 6 drive
boxes with RAID 5 for each server.

The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.

This would be using your LAN to move data unless the server doing the
I/O happens to have the target disk locally available, right ? I guess
with gigabit ethernet this might not be such a problem anymore, except
for processor overhead.
A SAN will allow every server to use Fibre Channel to move the data,
your LAN and server cpus won't be loaded nearly as much. Depending on
your application load, you could save a lot on LAN switches and servers
by spending more on a SAN.

It seems that SANS are more geared towards allowing multiple servers to
use a 'shared storage' of sorts, but that the 'shared storage' is
partitioned for each individual server accessing it. Is that the case?
I'm in need of more of a NAS type of solution where there is one large
pooled storage area that all servers can access. They will all need
access to the same files.

NAS Box basically a server attach to a RAID box. (Fibre Channel, SCSI,
IDE, ATA etc..) where you access your storage through that sever in
your NAS box. The problem with NAS is the storage server in the NAS
become your bottle neck. Also you will run in to limitation once you
start to grow your storage. If go with pure SAN you can add stroage
independently to the server. For what you need you could use server
cluster to access the SAN and provide service to your clinet.

Have to thought about back up or mirror?
...
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.

The only reason to go this way instead of a regular SAN would be cost I
guess; by using plain old scsi drives you'll cut down the cost
significantly. But again my first question, do you plan on using some
form of RAID (software, raid controller, raid enclosure,...) ?

If you just plug in a bunch of scsi drives into a bunch of servers and
start storing data on then, at a hundred terabytes worth of disks, you'll
be running around shutting down hosts in order to swap out disks all
day in my opinion.
Arne Joris

Ah. Good point. But, no matter which route I go I'm going to end up
with numerous drives, so I think this problem will exist no matter
what, right?

That true you will have numerous drive with either option. If your
data store across many server you also run it to problem of system HW
failure and not just HD any more. Once you start to have problem with
your storage on those server you will need to first identify if the
server is the probem or the drive. Then how to keep all those server
OS and antivirs uptodate. Good Luck,

In the SAN configuration you would be just dealing with drive failure.

Thanks!

JB

#24 November 24th 04, 08:23 PM

Hey

On Wed, 24 Nov 2004 17:46:10 +0000, Faeandar wrote:

If you ever talk to anyone who's installed it it generally takes 3
PHD's to install and configure this thing. It's still not ready for
prime time and requires alot of kernel hacks for bug fixes, sometimes
4 a week.

Ok i didnt knew that. One of my frieds recommended it to for a project
im working on currently. I haven't tried it though.

Also, the write performance on this product blows. If you need insane
read throughput they're a good solution, once you get it configured.
But for any real write requirements just forget it.

The 1.2 branch which is productionready not free though is supposed to
be better according to ClusterFS.

/T

#25 November 24th 04, 09:15 PM

On Wed, 24 Nov 2004 21:23:14 +0100, "Mr. X"
wrote:

Hey

On Wed, 24 Nov 2004 17:46:10 +0000, Faeandar wrote:

If you ever talk to anyone who's installed it it generally takes 3
PHD's to install and configure this thing. It's still not ready for
prime time and requires alot of kernel hacks for bug fixes, sometimes
4 a week.

Ok i didnt knew that. One of my frieds recommended it to for a project
im working on currently. I haven't tried it though.

Also, the write performance on this product blows. If you need insane
read throughput they're a good solution, once you get it configured.
But for any real write requirements just forget it.

The 1.2 branch which is productionready not free though is supposed to
be better according to ClusterFS.

/T

It's "production ready" for places like Lawrence Livermoore and Sandia
National, not for chip companies or manufacturing or petroleum
exploration. Talk to ClusterFS and what they sell is consulting and
support, the product is open source so they can't charge for it. But
they can charge for install/config and support which, if you plan to
use it and don't have a few AstroPhysicists handy, you'll need.

~F

#26 November 24th 04, 10:02 PM

On Wed, 24 Nov 2004 21:15:02 +0000, Faeandar wrote:

It's "production ready" for places like Lawrence Livermoore and Sandia
National, not for chip companies or manufacturing or petroleum
exploration. Talk to ClusterFS and what they sell is consulting and
support, the product is open source so they can't charge for it. But
they can charge for install/config and support which, if you plan to
use it and don't have a few AstroPhysicists handy, you'll need.

Ok so you would recommend something like PolyServ which is more a plug
and play solution for something like a webcluster.

/T

#27 November 24th 04, 10:12 PM

On Wed, 24 Nov 2004 23:02:36 +0100, "Mr. X"
wrote:

On Wed, 24 Nov 2004 21:15:02 +0000, Faeandar wrote:

It's "production ready" for places like Lawrence Livermoore and Sandia
National, not for chip companies or manufacturing or petroleum
exploration. Talk to ClusterFS and what they sell is consulting and
support, the product is open source so they can't charge for it. But
they can charge for install/config and support which, if you plan to
use it and don't have a few AstroPhysicists handy, you'll need.

Ok so you would recommend something like PolyServ which is more a plug
and play solution for something like a webcluster.

/T

I can't say I'd recommend them without knowing your specifics. But
what I can say is I like their story and they perform as advertised.
I think there are more solutions available if you are primarily
read-performance concerned, but it's a smaller venue if
write-performance is your game.

~F

#28 November 24th 04, 11:27 PM

On 2004-11-24 03:03:27 -0500, Faeandar said:

My first thought is you are over-architecting this. I don't see a
reason to require multi-host write access for something like this.
How many concurrent users and how much data is transferred?
Do you really need more than one server for data movement? Perhaps a
beefy server with a failover partner would suffice?

~F

Yeah, it's totallyl possible that I am thinking about this way too much
and making it more complicated than it needs to be.

One thing that could be added is another level of servers between the
web servers and the storage system that will do nothing but perform
actions on the files in storage. Basically, a user could use the
front-end to select a group of files and ask for an action to be
performed on them. That action would be 'batched' (sort of) on the
middle servers, so they would need write (and read) access to the same
files as the front end web boxes.

One of the other reasons for the original thinking was to be able to
use cheaper machines in front. It's very easy to purchase additional
lower-end servers to perform those tasks and scale by adding boxes as
needed.

Well, without knowing how many concurrent users and how much traffic
there is you won't be able to architect a solution very well. You
need to understand those pieces first.

About the volume manager mention from earlier...I understand that is
needed, the part I am confused about (with the SAN) is where is the
Volume Manager installed? There is a 'head' for the SAN, correct? That
head is nothing more than a server of sorts? That 'head' has the
various arrays attached to it and the VM can control across those
arrays? Is that close to what we are talking about?

SAN was nothing more than fancy, and expensive, DAS (direct attached
storage). Now clustered file systems are starting to differentiate
it. But I digress....

A volume manager is host-based, so it lives on the individual hosts.
It can do many things for the host but in this case what we're talking
about is it's ability to take 2 distinct LUN's and bind them together
to make a coherent single file system for the host. This is what
allows you to take a LUN from array 1 and a LUN from array 2 and put
them together to form a single file system for the host. Veritas is
probably the most common 3rd party VM, maybe even for OS's that come
with a VM built in. As long as you're not trying to use the VM to do
any sort of raid (that's what the array is for) it puts negligible
load on the host to manage the LUN's for writes/reads.
Easy way to think of it is a layer between the OS and the LUN. The OS
talks to the VM, and the VM talks to the LUN's. LUN's are nothing
more than parts or wholes of physical drives in the array allocated to
form a logical unit, hence the LU in LUN.

For SAN's there is no "head", that's a NAS term. There are controller
cards and storage ports. These are what are connected to hosts or
switches. Fiber comes out of the host, into a switch, and from there
into a storage port on the array. You can remove the switch if you
want, makes no difference except in number of hosts you can hook up.
Most arrays have 32 storage ports or less, whereas with switches you
can fan out to alot more than that.

~F

Great info. I really appreciate the help. My questions are sounding
more 'newbie' as I go along.

In terms of number of users, it would start small, probably around 50
concurrent users. However, within a few months that could grow a bit,
but I would never expect more than 200 concurrent users.

I am still grasping to understand this scenario:

Let's say I have one server and one array. (And a switch, just to be solid.)

That is an easy setup. Then I add a second array, this is still not
that big a deal as I can use the VM to 'join' the arrays so the server
is still seeing one filesystem.

The problem (for my head anyway) is when I add a second server to the setup.

Now I have 2 servers, 2 arrays and a switch. One server is already
seeing one filesystem across both arrays, thanks to the VM, however, on
the second server how does it see the same filesystem?

Would I then just set up the VM on the second server to exactly what
server 1 is doing? So the VM in server 2 is set up exactly like the VM
on server 1?

Or...does a 'cluster filesystem' need to be in place at this point?

I now have a much better understanding of the SAN vs NAS idea, as well
as the two of them vs standard direct attached storage. The above issue
is the last piece that is still hurting my head.

And, just for the sake of discussion, if raw speed between the disks
and user are not my most important piece, does it make sense to look at
a NAS device instead? Or, just a big honking NFS server using a VM
that I can grow a RAID on?

Also, for backups, my understanding is that it is much easier to backup
a SAN vs a NAS due to the fact that with the SAN I can do a complete
snapshot of the device rather than just the mountpoint and files with
the NAS. Correct?

I like the fact that with the SAN I can do away with the possible
headaches of having an actual 'server' controllilng the storage,
meaning I don't have the issue of an NFS (or SAN head) server going
down and taking everything with it. So that is a strong point for sure.

Ok...am I making any more sense now or am I confusing the matter more?

Again...thanks MUCH for the assistance!

JB

#29 November 25th 04, 12:17 AM

In terms of number of users, it would start small, probably around 50
concurrent users. However, within a few months that could grow a bit,
but I would never expect more than 200 concurrent users.

Doubtful you need alot of horsepower then. Most enterprise class
servers can handle that number of users.

I am still grasping to understand this scenario:

Let's say I have one server and one array. (And a switch, just to be solid.)

That is an easy setup. Then I add a second array, this is still not
that big a deal as I can use the VM to 'join' the arrays so the server
is still seeing one filesystem.

The problem (for my head anyway) is when I add a second server to the setup.

Now I have 2 servers, 2 arrays and a switch. One server is already
seeing one filesystem across both arrays, thanks to the VM, however, on
the second server how does it see the same filesystem?

Would I then just set up the VM on the second server to exactly what
server 1 is doing? So the VM in server 2 is set up exactly like the VM
on server 1?

Or...does a 'cluster filesystem' need to be in place at this point?

Correct. This is where the cluster file system comes into play. If
you are planning on sharing this data such that the SAN attached
servers act as NFS servers for the data in question, by all means go
NAS. That is what you'd be doing anyway.
Two hosts cannot write to the same file system without some sort of
software to allow write locks. For most CFS's there's a lock manager
on the fc network that hands out locks to hosts. in some cases that
lock metadata is distributed among all the nodes. This is a whole
'nother topic that deserves it's own thread.

I now have a much better understanding of the SAN vs NAS idea, as well
as the two of them vs standard direct attached storage. The above issue
is the last piece that is still hurting my head.

And, just for the sake of discussion, if raw speed between the disks
and user are not my most important piece, does it make sense to look at
a NAS device instead? Or, just a big honking NFS server using a VM
that I can grow a RAID on?

I am a big fan of NAS, after all NFS is the oldest running multi-write
file system in the open systems world (some 20 years now I believe).
And there are several NAS solutions that are faster than SAN solutions
depending on the data traffic. If you simply want multi-user access
to the same data, and uber speed is not paramount, then go with NAS.
I like NetApp personally for performance and features but BlueArc has
a good story as well. And are likely cheaper, NetApp can be fairly
expensive but it's great stuff.
No VM is needed for NAS because it has it's own built in.

Also, for backups, my understanding is that it is much easier to backup
a SAN vs a NAS due to the fact that with the SAN I can do a complete
snapshot of the device rather than just the mountpoint and files with
the NAS. Correct?

SAN is easier to backup in almost all cases, but not because of
snapshots. NAS snapshots are usually much better than SAN. Reason
being that the NAS is the drives PLUS the file system. This means it
can guarentee data integrity when it takes the snapshot because it
also performs the write operations, and so suspends writes while it
takes the snapshot. SAN cannot do this because it only controls the
drives, the hosts control the file system.
NAS backups are a pain because generally NDMP is used, which usualy
requires additional licenses from your backup software vendor. It's a
solid protocol, just not as easy to work with, it's very minimalist.

I like the fact that with the SAN I can do away with the possible
headaches of having an actual 'server' controllilng the storage,
meaning I don't have the issue of an NFS (or SAN head) server going
down and taking everything with it. So that is a strong point for sure.

For the most part SAN arrays and networks are more stable but NAS can
be very highly available if you build it right. It all depends on
what you need. Are you willing to accept potentially more downtime
for better management and ease of use? Dunno, business call.

But a nice effect of NFS's statelessness is that it will keep trying a
connection for up to 15 minutes before giving up and staling the
mount. This is usually far more time than you need to get a NAS head
back up. Most problems with NAS heads are panic reboots, which are
usually 90 seconds or less, and OS upgrades, which are also reboots of
90 seconds or less.

Ok...am I making any more sense now or am I confusing the matter more?

nope, doing just fine.

~F

#30 November 30th 04, 03:41 PM

JB Orca writes:
It seems that SANS are more geared towards allowing multiple servers
to use a 'shared storage' of sorts, but that the 'shared storage' is
partitioned for each individual server accessing it. Is that the case?

It depends on the file system you're using. If you have a file system
which supports multiple writers, the storage needn't be partitioned.

I'm in need of more of a NAS type of solution where there is one large
pooled storage area that all servers can access. They will all need
access to the same files.

Why? It sounds like your data rates are going to be quite low (only
50-200 users?). Are you sure you can't get by with one server?

Ah. Good point. But, no matter which route I go I'm going to end up
with numerous drives, so I think this problem will exist no matter
what, right?

How much of your data is active on any given day?

I work on a product from Sun, SAM-QFS, which is a filesystem
integrated with an archiving system. It provides transparent movement
of data between disk and tape (rather like a traditional HSM, though
somewhat more flexible).

Depending on the details of your environment, it might be reasonable
to consider buying, say, 2-10 TB of disk, with a tape library as an
archive to hold further growth. This makes a lot of sense if you're
in an environment where you want to be able to store and quickly
retrieve old projects, for instance, but only a few are active at any
one time. In addition, it's relatively inexpensive to expand a tape
library over time (assuming it was sized right in the first place)
just by buying more tapes.

This has the additional advantage of taking care of backup for you
(simply tell the system to keep more than 1 copy on tape). Backing up
50 TB of data is not easy, especially if you can't be down for days
during the restore process.

Our product runs on Solaris, but there are some others available for
other platforms. It really sounds like this might be a simpler
solution to manage for you than 50 TB of disk storage.

(Incidentally, SAM-QFS is also a shared file system, so you could use
multiple servers with one SAN if desired. It supports up to 252 LUNs
in one file system, which makes expansion by buying new disk arrays
relatively easy. But if you don't already have Solaris in your
environment, you might not want to add it, though we do have many
customers who use Sun systems for their storage simply because of
SAM.)

-- Anton

(honest, i'm not in marketing. just thinking that having 50 TB of
disk probably isn't the right way to go if you're not accessing all of
that data all of the time. individual disks are cheap, but reliable
arrays are not, and the management of arrays gets expensive.)

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Enterprise Storage Management (ESM) FAQ Revision 2004/06/23 - Part 1/1	Will Spencer	Storage & Hardrives	0	June 23rd 04 06:58 AM
STORAGE SERVERS	Phil Jennings	Storage & Hardrives	10	May 8th 04 02:09 AM
Enterprise Storage Management (ESM) FAQ Revision 2004/04/11 - Part 1/1	Will Spencer	Storage & Hardrives	0	April 11th 04 07:24 AM
Enterprise Storage Management (ESM) FAQ Revision 2004/02/16 - Part 1/1	Will Spencer	Storage & Hardrives	0	February 16th 04 09:23 PM
Terabyte Storage By Real-Storage	Real-Storage	Storage & Hardrives	2	October 23rd 03 04:18 PM