View Single Post
  #1  
Old January 28th 08, 12:45 PM posted to comp.arch.storage
[email protected]
external usenet poster
 
Posts: 6
Default brocade FC switches, automatically switch config at bootup

Somewhat simplified I have this situation using brocade switches,
running 5.3.0d:
Server A and storage A is connected to switch A. Server B and storage
B is connected to switch B. Switch A and switch B are connected via an
ISL, forming a fabric of two switches. Storage A is configured to
replicate all data to storage B. Thus, storage B has an exact copy of
all the LUNs provisioned by storage A.

If I crash site A, and issue a failover command to storage B, server B
can successfully access the LUNs on storage B. All is well.

However, if site A comes up again without any manual intervention, for
example a power failure has been fixed and everything boots up again
automatically. Now, storage A will still consider itself the "primary"
and provision LUNs to server A. At the same time, server B is the
actual live server, and is using the replicated LUNs.

From a storage perspective, storage A and storage B will no resync
until I tell it to, so that's not a major concern. I just have to make
sure storage B is replicating changes back to storage A, and not the
other way around.

My major concern is that I now have two copies of my LUNs, with a time
difference corresponding to the downtime of site A. In more detail,
the physical servers are running VMware, so I'm potentially looking at
dozens if not hundreds of network entities now coming live in two
versions.

I have tried to solve this by creating a recovery zoning config, where
storage A is isolated from all servers. Thus, no server can mount LUNs
on storage A. When I crash site A, the surviving switch B is issued a
cfgenable recovery_config command. I had hoped than when site A and
switch A came back up, switch A would think "ok, I see when I boot
there's another config enabled than what I had when I was last alive,
I'll switch to that since the other switch is principal", assuming
switch B will become principal since it's the only remaining switch in
the fabric.

When I try this, switch A will not merge with switch B when switch A
boots up again. The fabric is segmented and the E-port claims a zoning
conflict. I can manually enable the recovery config on switch A and
bounce the E-port to rebuild the fabric, but ideally I'd like for
switch A to automatically merge with switch B and switch to recovery
config as soon as it boots.

I have been looking at setting switch B as principal "fabricPrincipal"
but doing a full scale crash, failover and failback is quite a lengthy
process and I'd like to hear from you people what you think.

Site A is the primary production site and site B is a recovery site. I
will not try to make this an active active configuration.