EMC to IBM SAN LUN replication

**Arne Joris** · #11 October 13th 06, 06:08 PM posted to comp.arch.storage

mackdaddy315 wrote:
... I am looking for the
ability to mirror those LUNS to the other SAN and have another node of
each cluster connected at the other site in case a complete site
failover is needed.

How far are those two SANs from each other ? The term 'site failover'
makes me think you are talking about two separate physical locations.

From what I know about the IBM LUN mirrorining option it would do this
with another IBM SAN but when you wanted to fail back you would have to
break the mirror set and recreate the mirror.

If you want synchronous replication of all data from your active site
to another site and you want to use different vendors at each site for
your storage, you can either use a storage appliance or use host
mirroring. The host-based IBM LUN mirroring has some drawbacks
apparently, so you might want to look into storage appliances like
IBM's San Volume Controller (SVC).
If you want asynchronous replication (because you have a large distance
between the sites) and use different vendors at each site, the only
options I'm aware of would be a set of YottaYotta boxes at each site.

Arne

**Nik Simpson** · #12 October 13th 06, 10:37 PM posted to comp.arch.storage

Arne Joris wrote:

If you want asynchronous replication (because you have a large distance
between the sites) and use different vendors at each site, the only
options I'm aware of would be a set of YottaYotta boxes at each site.

I know for a fact that both FalconStor & DataCore storage virtualization
platforms support asynch mirroring between disparate storage platforms.

--
Nik Simpson

**Arne Joris** · #13 October 17th 06, 04:03 PM posted to comp.arch.storage

Nik Simpson wrote:
Arne Joris wrote:

If you want asynchronous replication (because you have a large distance
between the sites) and use different vendors at each site, the only
options I'm aware of would be a set of YottaYotta boxes at each site.

I know for a fact that both FalconStor & DataCore storage virtualization
platforms support asynch mirroring between disparate storage platforms.

Yeah if you don't need your hosts at the second site to do anything
with the data until they need to take over when disaster strikes (ie.
the second site is idle), there are more options.

**Nik Simpson** · #14 October 17th 06, 09:48 PM posted to comp.arch.storage

Arne Joris wrote:
Nik Simpson wrote:
Arne Joris wrote:
If you want asynchronous replication (because you have a large distance
between the sites) and use different vendors at each site, the only
options I'm aware of would be a set of YottaYotta boxes at each site.

I know for a fact that both FalconStor & DataCore storage virtualization
platforms support asynch mirroring between disparate storage platforms.

Yeah if you don't need your hosts at the second site to do anything
with the data until they need to take over when disaster strikes (ie.
the second site is idle), there are more options.

If you want to do processing of data for applications like data mining
or backup, then you can take a snapshot of the remote mirror (both have
the ability to do that) and serve the snapshot up to an application
server for processing. What else did you have in mind, and if "there are
more options" then why not share you knowledge with us?

--
Nik Simpson

**Arne Joris** · #15 October 18th 06, 04:20 PM posted to comp.arch.storage

Nik Simpson wrote:
If you want to do processing of data for applications like data mining
or backup, then you can take a snapshot of the remote mirror (both have
the ability to do that) and serve the snapshot up to an application
server for processing. What else did you have in mind, and if "there are
more options" then why not share you knowledge with us?

Well for example if your secondary site is not just a remote data vault
but an actual production site where people need access to the data, it
is pretty lame to have severs at the secondary site go over the WAN to
go read the data from the primary storage when they have a copy of the
data right there ! With a distributed block cache on top of
asynchronous data replication, you could have both sites do I/O to the
same volumes and access their local storage.

When your technology can only provide snapshots at the secondary site,
it's hard to imagine any other applications than backup or data mining.
But when active/active I/O is possible at both sites, you can do things
like grid computing where the computing power is distributed over both
sites and both do I/O and provide fail over for the other site. Or you
could even put a distributed filesystem across both sites, and use
computing agent migration to move services between the sites, to
accomodate moving workloads.
There's a whole world of more options, but most of them are not well
established yet because it's an emerging technology.

**Bill Todd** · #16 October 19th 06, 12:46 AM posted to comp.arch.storage

Arne Joris wrote:
Nik Simpson wrote:
If you want to do processing of data for applications like data mining
or backup, then you can take a snapshot of the remote mirror (both have
the ability to do that) and serve the snapshot up to an application
server for processing. What else did you have in mind, and if "there are
more options" then why not share you knowledge with us?

Well for example if your secondary site is not just a remote data vault
but an actual production site where people need access to the data, it
is pretty lame to have severs at the secondary site go over the WAN to
go read the data from the primary storage when they have a copy of the
data right there ! With a distributed block cache on top of
asynchronous data replication, you could have both sites do I/O to the
same volumes and access their local storage.

When your technology can only provide snapshots at the secondary site,
it's hard to imagine any other applications than backup or data mining.
But when active/active I/O is possible at both sites, you can do things
like grid computing where the computing power is distributed over both
sites and both do I/O and provide fail over for the other site. Or you
could even put a distributed filesystem across both sites, and use
computing agent migration to move services between the sites, to
accomodate moving workloads.
There's a whole world of more options, but most of them are not well
established yet because it's an emerging technology.

Since VMS has been doing exactly these kinds of things since the
mid-'80s, calling it 'emerging technology' (rather than, say, catch-up
implementations) seems a bit of a stretch.

There is, however, a down-side, in that coordinated multi-site access
requires inter-site synchronization. For example, while asynchronous
replication can work just fine when your remote site is taking
block-level snapshots and then treating them like a separate system
(avoiding *any* additional load on your primary site, which in the case
of access-intensive operations like data-mining and backup can be a big
win), serving *current* data from that remote site runs the risk of
serving up stale data (or even partially-updated garbage) unless the
access is coordinated with the primary site (in which case you still
incur the latency and primary-site overheads associated with that
coordination, but can usually avoid the bulk bandwidth overhead
associated with serving up the actual data from the primary site).

VMS's distributed lock manager has for the past 2+ decades done about
the best that can be done in such situations, by letting locking
management migrate to where most of the actual access is occurring on a
fairly fine-grained basis. Thus the lock management can migrate to the
remote site if that's where the accesses are primarily occurring,
leaving only the rarer primary-site accesses to incur the inter-site
synchronization overheads. But even then, there's a fair amount of
inter-site conversation that has to occur, and everything has to be
handled synchronously.

- bill

**Arne Joris** · #17 October 20th 06, 03:56 PM posted to comp.arch.storage

Bill Todd wrote:
Since VMS has been doing exactly these kinds of things since the
mid-'80s, calling it 'emerging technology' (rather than, say, catch-up
implementations) seems a bit of a stretch.

The difference is that once that clustering technology moves into the
storage appliance, the applications don't need to know anything about
it. Whereas high-end applications can be designed to work on VMS, you
can now make low-end cluster-unaware applications like for example dumb
web servers run on both sites and present up-to-date data with both
sites doing writes.

There is, however, a down-side, in that coordinated multi-site access
requires inter-site synchronization. For example, while asynchronous
replication can work just fine when your remote site is taking
block-level snapshots and then treating them like a separate system
(avoiding *any* additional load on your primary site, which in the case
of access-intensive operations like data-mining and backup can be a big
win), serving *current* data from that remote site runs the risk of
serving up stale data (or even partially-updated garbage) unless the
access is coordinated with the primary site (in which case you still
incur the latency and primary-site overheads associated with that
coordination, but can usually avoid the bulk bandwidth overhead
associated with serving up the actual data from the primary site).

Yes if your secondary site want to access data not yet replicated to
local storage by the asynchronous mechanism, you'll need to take a
latency hit and go get it at the primary site. When the storage
appliance privides a distributed cache on top of the async replication,
it can hide all these complexities for you and give the secondary site
data from local storage unless it knows it needs to go get it at the
other site. All the app would see is some I/O is fast, other I/O is
not.

VMS's distributed lock manager has for the past 2+ decades done about
the best that can be done in such situations, by letting locking
management migrate to where most of the actual access is occurring on a
fairly fine-grained basis. Thus the lock management can migrate to the
remote site if that's where the accesses are primarily occurring,
leaving only the rarer primary-site accesses to incur the inter-site
synchronization overheads. But even then, there's a fair amount of
inter-site conversation that has to occur, and everything has to be
handled synchronously.

There is no lock when the storage appliance maintains the inter-site
coherence; if applications need a lock manager (like distributed file
systems for example), they'll need to provide their own.

You can only avoid latency penalties by not accessing the same data at
both sites within the async replication window. Luckily there are many
I/O patterns that can avoid this. My example of agent migration for
example, would have a piece of data being accessed from a single site
only, until the agent moves to the other side. For any application
where data is produced only once, rarely modified but read often, the
latency hits are minimal, and the ease of just being able to read
up-to-date data at all sites at any moment is a big benefit. The apps
at all sites can be completely unaware of synchronisation windows and
do not need to be made aware of each other.

Arne

**Bill Todd** · #18 October 21st 06, 02:01 AM posted to comp.arch.storage

Arne Joris wrote:
Bill Todd wrote:
Since VMS has been doing exactly these kinds of things since the
mid-'80s, calling it 'emerging technology' (rather than, say, catch-up
implementations) seems a bit of a stretch.

The difference is that once that clustering technology moves into the
storage appliance, the applications don't need to know anything about
it. Whereas high-end applications can be designed to work on VMS, you
can now make low-end cluster-unaware applications like for example dumb
web servers run on both sites and present up-to-date data with both
sites doing writes.

Perhaps you should actually learn something about VMS clustering before
presuming to expound on the deficiencies that you imagine it has. VMS's
distributed file system has, since the mid-'80s, provided exactly what
you describe above: an environment in which multiple instances of dumb
applications executing at different sites can concurrently share (and
update) the same file(s), with the same integrity guarantees that they'd
enjoy if they were all running on a single machine.

For that matter, exactly the same lock-management mechanisms that allow
somewhat smarter applications to coordinate the use of multiple
cooperating instances on a single machine can be used transparently to
continue to coordinate a similar group of cooperating instances spread
across the (potentially inter-site) cluster, so even in that case the
application need not be 'cluster-aware', just
multi-application-instance-aware.

Should you for some reason wish to use the VMS systems at the separate
sites as mere file servers (which kind of sounds like what you were
referring to when talking about 'moving the clustering technology into
the storage appliance' above), rather than take advantage of the
performance benefit of having the application instances run on them
directly, you could of course do that as well, in which case the client
machines at both sites would connect to and use them just as if they
were all connecting to a single file server.

There is, however, a down-side, in that coordinated multi-site access
requires inter-site synchronization. For example, while asynchronous
replication can work just fine when your remote site is taking
block-level snapshots and then treating them like a separate system
(avoiding *any* additional load on your primary site, which in the case
of access-intensive operations like data-mining and backup can be a big
win), serving *current* data from that remote site runs the risk of
serving up stale data (or even partially-updated garbage) unless the
access is coordinated with the primary site (in which case you still
incur the latency and primary-site overheads associated with that
coordination, but can usually avoid the bulk bandwidth overhead
associated with serving up the actual data from the primary site).

Yes if your secondary site want to access data not yet replicated to
local storage by the asynchronous mechanism, you'll need to take a
latency hit and go get it at the primary site. When the storage
appliance privides a distributed cache on top of the async replication,
it can hide all these complexities for you and give the secondary site
data from local storage unless it knows it needs to go get it at the
other site. All the app would see is some I/O is fast, other I/O is
not.

And exactly how do you think that the secondary site's portion of your
hypothetical distributed cache would *know* that its local data was
stale, without some form of synchronous communication with the primary
site? Your hands are waving pretty fast, but I suspect you really don't
know much about this subject at the level of detail that you're
attempting to discuss.

If one assumes a major bandwidth bottleneck between the sites, such that
small communications about data currency can be synchronous but large
updates cannot easily be, there might be *some* rationale for the kind
of system that you seem to be describing. But in that case,
secondary-site accessors aren't going to get very good service when they
want something that has changed recently. Furthermore, since small
synchronous messages are required anyway, the VMS approach that can
serve up data from the secondary site whenever that makes sense, and
even coordinate updates and lock management there if that's where most
of the action is taking place, is arguably superior for many kinds of
workloads (in fact, one would have to work kind of hard to find a
realistic workload for which that was not the case).

VMS's distributed lock manager has for the past 2+ decades done about
the best that can be done in such situations, by letting locking
management migrate to where most of the actual access is occurring on a
fairly fine-grained basis. Thus the lock management can migrate to the
remote site if that's where the accesses are primarily occurring,
leaving only the rarer primary-site accesses to incur the inter-site
synchronization overheads. But even then, there's a fair amount of
inter-site conversation that has to occur, and everything has to be
handled synchronously.

There is no lock when the storage appliance maintains the inter-site
coherence;

It sounds as if you may be confused again: applications don't *see* any
locking using inter-site VMS distributed file access any more than they
see it doing local access: the 'locking' involved is strictly internal
to the file system (just as it is for a local-only file system), and is
largely involved in guaranteeing the atomicity of data updates (if two
accessors try to update the same byte range, for example, the result
really ought to be what one or the other wrote, rather than some mixture
of the two) or internal file system updates (say, an Open and a Delete
operation racing each other, where they really need to be serialized
rather than mixed together).

Or perhaps you're suggesting that with block-level inter-site
replication no inter-site locking is required to support inter-site
file-level access. If so, you're simply wrong: even if the remote site
applies updates in precisely the same order in which they're applied at
the primary site, lack of access to the primary site's in-memory file
system context remains a problem (i.e., the secondary site is still
stale in that sense unless the primary site's file system does something
special to ensure that it isn't, outside the confines of your 'storage
appliance').

if applications need a lock manager (like distributed file
systems for example), they'll need to provide their own.

What's this "if"? The only instances in which they will *not* require
something like a distributed lock manager are exactly those which Nick
described: using snapshot-style facilities at the secondary site to
create an effectively separate (usually read-only) environment which can
then be operated upon in whatever manner one wants.

You can only avoid latency penalties by not accessing the same data at
both sites within the async replication window.

Same problem I noted above: unless you've got synchronous oversight to
catch such issues (even if the actual updates can be somewhat
asynchronous), it's all too easy to get stale data as input to some
operation that will then use it to modify other parts of the system.

Luckily there are many
I/O patterns that can avoid this. My example of agent migration for
example, would have a piece of data being accessed from a single site
only, until the agent moves to the other side.

That may work OK for read-only access, but then so does a block-level
snapshot (just take one when you move the agent). Where data gets
updated (or, perhaps worse, appended to), you've got allocation activity
at the new site which must be carefully coordinated with the primary site.

For any application
where data is produced only once, rarely modified but read often, the
latency hits are minimal, and the ease of just being able to read
up-to-date data at all sites at any moment is a big benefit.

If the updates really are that rare, then synchronous replication and
VMS-style site-local access will work fine.

The apps
at all sites can be completely unaware of synchronisation windows and
do not need to be made aware of each other.

Just as has been the case with VMS for decades - as I said.

- bill

**Nik Simpson** · #19 October 21st 06, 04:20 PM posted to comp.arch.storage

Arne Joris wrote:
Nik Simpson wrote:
If you want to do processing of data for applications like data mining
or backup, then you can take a snapshot of the remote mirror (both have
the ability to do that) and serve the snapshot up to an application
server for processing. What else did you have in mind, and if "there are
more options" then why not share you knowledge with us?

Well for example if your secondary site is not just a remote data vault
but an actual production site where people need access to the data, it
is pretty lame to have severs at the secondary site go over the WAN to
go read the data from the primary storage when they have a copy of the
data right there ! With a distributed block cache on top of
asynchronous data replication, you could have both sites do I/O to the
same volumes and access their local storage.

If that's what you want to do, then it makes a hell of a lot more sense
to do the replication at the file system level not at the LUN, because
you need something in the replication layer that understands
synchronization issues at the file system. Sounds like a job for a WAFS,
not a block level replication technology where the challenge of keeping
the file system in sync with two seperate writers seperated by high
latency WAN link.

--
Nik Simpson

**Nik Simpson** · #20 October 21st 06, 04:24 PM posted to comp.arch.storage

Thanks Bill, you saved me the trouble ;-)

--
Nik Simpson

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
EMC Celerra Replication	Fred	Storage & Hardrives	1	February 9th 06 05:37 PM
replication and tape backup	[email protected]	Storage & Hardrives	4	February 22nd 05 03:21 AM
SAN replication - WAN	Hal Kuff	Storage & Hardrives	1	October 22nd 04 01:21 AM
SAN filesystem uses local storage for reads with synchronous replication	Bill Todd	Storage & Hardrives	6	October 21st 04 04:54 PM
Sync Replication over CWDM lines	Andy S	Storage & Hardrives	1	July 21st 04 09:06 PM