If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
SAN filesystem uses local storage for reads with synchronous replication
"George Orwell" wrote in message ... I have servers and storage at two sites, with an FC bridge over an ip WAN in between them to create a SAN. Now I would like to run a distributed filesystem on servers at both sites, accessing the same storage. For redundancy reasons, I would also like to replicate all the storage so both sites have a copy of the data at all times. If I chose synchronous replication, I would expect each site to be able to use the local copy for reads, giving me local read speeds. Does anyone know of a filesystem/SAN appliance combination that can give me this ? The short answer is, I don't. Most storage-level 'synchronous replication' mechanisms were not set up to allow any (coordinated) access at all to the 'remote' replica, because they were created back when storage-level replication was usually done with a 'passive' remote copy meant only to take over if the primary copy became inaccessible. Some distributed systems (VMS's comes to mind) will cheerfully optimize reads to target the nearest replica if you make them aware that this is desired. But that facility is part of VMS's system-software-controlled mirroring facilities rather than of an underlying hardware array doing its own remote mirroring (though its HSx controllers may offer similar facilities nowadays). Finally, even if you find some hardware set up to allow concurrent read activity at both replicas, unless you are using a distributed file system that optimizes its coordination (locking) mechanisms across both sites so that at least when possible locks are managed at the end that's actually using the associated data (again, VMS comes to mind here), you still may have inter-site lock traffic slowing things down even if the actual read accesses are performed locally (though that will only affect latency, rather than create the serious inter-site bandwidth demands that reading only at the 'primary' site might). Take a look at Lust it created a clone of the VMS distributed lock manager, and might (or might not) offer what you're looking for (though in a distributed-file-server-style environment rather than a conventional SAN). AIX's GPFS might too (IBM cloned the VMS lock manager about 10 years ago, primarily to support Oracle Parallel Server if I understand correctly, and may later have used it with GPFS in a manner you might find useful). But last I knew (and things might have changed since), most SAN file systems thought they were doing pretty well just coordinating shared access to storage from multiple hosts, without getting into finer points such as optimizing disaster-tolerant configurations where both sites were active. So if you find out differently, please let us know. - bill |
#2
|
|||
|
|||
"Tarapia Tapioco" wrote in message x.it... .... It depends on the type of traffic; for large files (hundreds of megabytes) the locking isn't a big issue if it's done at the file level. Indeed - if you're not doing concurrent distributed writes to the same file. But I'm interested in the locking mechanism you descibed; is this a truly distributed meta data mechanism ? Well, it's certainly distributed *management* of concurrently-shared metadata, rather than the more common central-metadata-server approach. The metadata itself may be distributed across sites, or centralized. I have heard that the overhead to distribute meta data operations isn't worth it unless you intend to do I/O to many small files (so the ratio of meta data operations to I/O is high). I'm not sure how that makes sense: the less metadata is used, the less impact it will have on overall activity regardless of how it may be managed. The main reason metadata management is distributed in VMS is likely to make the distributed file system resilient to single-node failures and peer-to-peer rather than client/server in nature, but in practice the overheads do not seem significant. Does VMS use a directory server to show which server "owns" a certain range of files or directories ? If the accesses to a file are dominated by a single system, all lock management for that file migrates automatically to that system (so the relatively infrequent accesses by other systems take a minor hit by having to send lock activity there, but most activity executes at local-host speeds). A new accessor queries one of the cluster's lock directory servers (nothing special about them - they just avoid the need for every system in the cluster to maintain high-level lock directory information) to find out where a given file's locks are 'mastered'. Take a look at Lust it created a clone of the VMS distributed lock manager, and might (or might not) offer what you're looking for (though in a distributed-file-server-style environment rather than a conventional SAN). AIX's GPFS might too (IBM cloned the VMS lock manager about 10 years ago, primarily to support Oracle Parallel Server if I understand correctly, and may later have used it with GPFS in a manner you might find useful). I am looking at CXFS (SGI's DFS). It doesn't do distributed locking, but that is not a big issue for me. I think all I need is a replicating appliance that allows I/O to the replica; if I zone my sites so each can only see the local copy, CXFS has no choice but to use the local one. As long as absolutely no writes are going to disk, that should work fine. As soon as any write activity occurs (even background writes such as updating 'last access times', where no application-level writes exist), the local file system's cached data (which it believes it can cache safely because it 'owns' that file system exclusively) will start to get stale, and, worse, may then be written back out to disk stale. The safest way to ensure that no writes are occurring is to write-lock the disks. And you'll then soon find out whether the file system is OK with that (early NTFS, for example, was not, IIRC). Unless I'm misunderstanding your intent, and the instances of CXFS on the various systems *will* in fact be aware of each other (you said no distributed locking, though, so unless CXFS operates as a client/server architecture - at least as far as locking goes, and my vague recollection now is that it may - I'm not sure how they could coordinate with each other). Interesting thought about making different (local) mirror-disks appear to be the same to all instances and letting a transparent disk-level replicator keep them in synch: as long as CXFS does effectively lock anything being updated such that no one else can look at it until the update completes (at *all* copies), it might work - but you'd still lose a site if its local mirror copy failed unless some provision for automatic revectoring to a surviving remote disk existed. - bill |
#3
|
|||
|
|||
Just use a pair of NetApp Filers (one at each site) running SnapMirror -
very easy. Since NetApp's file system is not unix or Windows you are less susceptible to virus propagation. Most synchronous replication software replicates real-time the viruses - this erodes both data and the OS. "George Orwell" wrote in message ... I have servers and storage at two sites, with an FC bridge over an ip WAN in between them to create a SAN. Now I would like to run a distributed filesystem on servers at both sites, accessing the same storage. For redundancy reasons, I would also like to replicate all the storage so both sites have a copy of the data at all times. If I chose synchronous replication, I would expect each site to be able to use the local copy for reads, giving me local read speeds. Does anyone know of a filesystem/SAN appliance combination that can give me this ? Arne Joris |
#4
|
|||
|
|||
Bill Todd wrote:
It depends on the type of traffic; for large files (hundreds of megabytes) the locking isn't a big issue if it's done at the file level. Indeed - if you're not doing concurrent distributed writes to the same file. Yeah you are right, concurrent writes will hammer the locks. But my traffic consists of a single writer and multiple readers. I have heard that the overhead to distribute meta data operations isn't worth it unless you intend to do I/O to many small files (so the ratio of meta data operations to I/O is high). I'm not sure how that makes sense: the less metadata is used, the less impact it will have on overall activity regardless of how it may be managed. The main reason metadata management is distributed in VMS is likely to make the distributed file system resilient to single-node failures and peer-to-peer rather than client/server in nature, but in practice the overheads do not seem significant. I imagine the indirection of going to a directory server to look up the server owning the meta data *can* introduce extra latency (over the WAN) plus cpu cycles for non-metadata intensive workloads. If the accesses to a file are dominated by a single system, all lock management for that file migrates automatically to that system (so the relatively infrequent accesses by other systems take a minor hit by having to send lock activity there, but most activity executes at local-host speeds). A new accessor queries one of the cluster's lock directory servers (nothing special about them - they just avoid the need for every system in the cluster to maintain high-level lock directory information) to find out where a given file's locks are 'mastered'. Yeah it sounds like the distributed meta data service would create a more site aware solution than the single meta data server model, just by virtue of doing all meta data operations at the site where the data is produced or consumed. Now for my traffic, I have a writer at site 1, and several readers at both sites 1 and 2. So really the data is being accessed at both sites, so we'll have to go over the WAN regardless of where the metadata is kept. As long as absolutely no writes are going to disk, that should work fine. As soon as any write activity occurs (even background writes such as updating 'last access times', where no application-level writes exist), the local file system's cached data (which it believes it can cache safely because it 'owns' that file system exclusively) will start to get stale, and, worse, may then be written back out to disk stale. No CXFS should maintain host cache coherency, that's one of the main tasks a distributed file system should do (along with file locking). Interesting thought about making different (local) mirror-disks appear to be the same to all instances and letting a transparent disk-level replicator keep them in synch: as long as CXFS does effectively lock anything being updated such that no one else can look at it until the update completes (at *all* copies), it might work - but you'd still lose a site if its local mirror copy failed unless some provision for automatic revectoring to a surviving remote disk existed. My applications do not require a very strict data coherency model; the readers are processing data in the files that is well behind the data that the writer is appending to the file (think seismic data is being written, and several readers do different kinds of processing on data that is at least 20 minutes old). The only requirement is that when the readers keep reading the same file without opening or closing them, they should eventually read the new data the writer has been appending to them. So even if the writer's very latest data is still in the host cache, eventually it should be flushed and the synchronous mirroring ought to make it show up at the other site. I hope CXFS' host cache coherency will cause the writer to flush within guaranteed time lines so the readers can see the data (I'm not sure what those time lines are though, but I have a big time margin). Arne Joris |
#5
|
|||
|
|||
"Monte Oates" wrote in message news:dhmdd.157595$a41.79236@pd7tw2no... Just use a pair of NetApp Filers (one at each site) running SnapMirror - very easy. Since NetApp's file system is not unix or Windows you are less susceptible to virus propagation. Most synchronous replication software replicates real-time the viruses - this erodes both data and the OS. Regardless of the merit of such a suggestion for some people, the original poster specified the behavior he wants - and being an asynchronous replication mechanism SnapMirror won't give it to him. - bill |
#6
|
|||
|
|||
"Arne Joris" wrote in message news:v5Qdd.790510$M95.165193@pd7tw1no... Bill Todd wrote: .... I have heard that the overhead to distribute meta data operations isn't worth it unless you intend to do I/O to many small files (so the ratio of meta data operations to I/O is high). I'm not sure how that makes sense: the less metadata is used, the less impact it will have on overall activity regardless of how it may be managed. The main reason metadata management is distributed in VMS is likely to make the distributed file system resilient to single-node failures and peer-to-peer rather than client/server in nature, but in practice the overheads do not seem significant. I imagine the indirection of going to a directory server to look up the server owning the meta data *can* introduce extra latency (over the WAN) plus cpu cycles for non-metadata intensive workloads. Since in the absence of active contention it goes to the directory server only once per file or directory accessed, that latency (which should be less than a single disk access) should usually be negligible. .... it sounds like the distributed meta data service would create a more site aware solution than the single meta data server model, just by virtue of doing all meta data operations at the site where the data is produced or consumed. Not really: where the metadata is processed has relatively little to do with what disk it's obtained from, unless the system goes to some essentially orthogonal effort to access the most local copy available. Now for my traffic, I have a writer at site 1, and several readers at both sites 1 and 2. So really the data is being accessed at both sites, so we'll have to go over the WAN regardless of where the metadata is kept. It doesn't sound as if you really care much about remote metadata access, at least as long as the amount of metadata processed is negligible compared with the amount of data fetched. Inter-site latencies tend not to become *really* noticeable until distances in the hundreds of miles are involved (100 miles being on the order of a millisecond, one-way). And if you're mostly reading large files sequentially, it doesn't sound as if latency should be any real concern there, either. But unless you've got unlimited bandwidth between the two sites, your desire to do reads locally is understandable. - bill |
#7
|
|||
|
|||
Bill Todd wrote:
it sounds like the distributed meta data service would create a more site aware solution than the single meta data server model, just by virtue of doing all meta data operations at the site where the data is produced or consumed. Not really: where the metadata is processed has relatively little to do with what disk it's obtained from, unless the system goes to some essentially orthogonal effort to access the most local copy available. But my metadata would also be replicated to both sites, so disk access is *always* local (but writes are being replicated synchronously so would be a bit slow). In that case, having meta data operations handled at the site that is about to do I/O would not introduce *any* cross-WAN operations at all. Now for my traffic, I have a writer at site 1, and several readers at both sites 1 and 2. So really the data is being accessed at both sites, so we'll have to go over the WAN regardless of where the metadata is kept. It doesn't sound as if you really care much about remote metadata access, at least as long as the amount of metadata processed is negligible compared with the amount of data fetched. Inter-site latencies tend not to become *really* noticeable until distances in the hundreds of miles are involved (100 miles being on the order of a millisecond, one-way). I have latencies of 100ms round trip (thousands of miles plus the WAN is routed which introduces switching latencies). And if you're mostly reading large files sequentially, it doesn't sound as if latency should be any real concern there, either. But unless you've got unlimited bandwidth between the two sites, your desire to do reads locally is understandable. Yeah my main concern is not meta data access, I care mostly about data being accessed locally. Arne Joris |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Enterprise Storage Management (ESM) FAQ Revision 2004/06/23 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | June 23rd 04 06:58 AM |
CFP - Extended Deadline (12 June) - Workshop on Scalable File Systems and Storage Technologies | Vijay Velusamy | Storage & Hardrives | 0 | June 8th 04 04:53 AM |
Enterprise Storage Management (ESM) FAQ Revision 2004/04/11 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | April 11th 04 07:24 AM |
Enterprise Storage Management (ESM) FAQ Revision 2004/02/16 - Part 1/1 | Will Spencer | Storage & Hardrives | 0 | February 16th 04 09:23 PM |
Terabyte Storage By Real-Storage | Real-Storage | Storage & Hardrives | 2 | October 23rd 03 04:18 PM |