A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Network mirroring



 
 
Thread Tools Display Modes
  #1  
Old December 24th 07, 09:22 PM posted to comp.arch.storage
S[_4_]
external usenet poster
 
Posts: 12
Default Network mirroring

I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.

So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.

S
  #2  
Old December 26th 07, 02:14 AM posted to comp.arch.storage
lahuman9
external usenet poster
 
Posts: 3
Default Network mirroring

On Dec 24, 4:22*pm, S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.

So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.

S


At first glance it without homogeneous storage and expecting it to be
cheap
I would say 'no way'.

But,

Depends on what you mean by "near-real-time" and how many files.
rsync
has a delay while it checks for which files have changed, but if your
software
can tell which files to transfer why not just use scp? Assuming you'd
set up
rsync to use scp to copy the file, you'd get rid of the overhead of
rsync.

I would not say that scp is near-real-time...
  #3  
Old December 26th 07, 06:59 AM posted to comp.arch.storage
Alvin Andries
external usenet poster
 
Posts: 6
Default Network mirroring


"S" wrote in message
...
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.

So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.

S


If your acquaintancce is using big $$$ DBs like Oracle or DB2, they have
options for distributed DB sites.
If he's running less elaborate DBs, the I would consider looking into
replaying change log files: otherwise, the syncing will take longer and
onger as the DB grows. Still, without more details, I only can say that you
should be aware of invalid states that can occur, e.g. you start with 10
items in stock and in the same sync slot, people at locations 1 and 2 order
6 items which they will be told to be delivered in 3 days.

Regards,
Alvin.


  #4  
Old December 26th 07, 11:49 AM posted to comp.arch.storage
Dieter Stumpner
external usenet poster
 
Posts: 5
Default Network mirroring

S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.


Hi!

It is complicate to replicate the database because of the locking
mechanism of a DB. Like previous poster mentioned, you cant sell one
item to two people.
I dont know your workload, but i would prefer a other approach. Use
reverse proxies to distribute your load. A huge example will be
"wikipedia" [1]. Only "one" DB-MySQL-Master and a lot of Apaches and Squids.

[1] http://meta.wikimedia.org/wiki/Wikimedia_servers

with best regards
Dieter Stumpner
  #5  
Old December 26th 07, 08:48 PM posted to comp.arch.storage
S[_4_]
external usenet poster
 
Posts: 12
Default Network mirroring

Hi Dieter,
The wiki example is awesome. Wiki's problem is kinda easy though
because presumably they have very few writes and a LOT of reads, for
which apache/squid would work.This guy has a different problem though
because he has lots of reads and lots of writes. Its not a big $$$ DB,
so I'm thinking replay the change logs and use scp/rsync.

Thanks all this was an interesting discussion.
S


On Dec 26, 3:49 am, Dieter Stumpner wrote:
S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.


Hi!

It is complicate to replicate the database because of the locking
mechanism of a DB. Like previous poster mentioned, you cant sell one
item to two people.
I dont know your workload, but i would prefer a other approach. Use
reverse proxies to distribute your load. A huge example will be
"wikipedia" [1]. Only "one" DB-MySQL-Master and a lot of Apaches and Squids.

[1]http://meta.wikimedia.org/wiki/Wikimedia_servers

with best regards
Dieter Stumpner


  #6  
Old January 2nd 08, 02:30 PM posted to comp.arch.storage
[email protected]
external usenet poster
 
Posts: 2
Default Network mirroring

JimK is correct. I work with FalconStor and it is possible to setup
synchronous or asynchronous remote mirrors between sites using
disparate hardware. The mirroring functions are done as a software
service through a gateway server/appliance and the back end hardware
has little or no bearing on the functionality. I can't comment on how
NetApps manages to keep the data in sync but can regarding FalconStor
for anyone interested. Once the first sync is done anything after that
is only sync'ing deltas based on changed sectors (not blocks). It is
very bandwidth efficient. There is a cache area defined so that
applications do not see the lag between sites and yet the sites will
remain in sync. In the event of a complete communications loss the
mirror is suspended and then once reestablished will perform a
comparison and fix the deltas not re-do the whole mirror. For complete
data integrity there are also application aware snapshot agents that
will properly quiesce databases for good restore points. Once
established you could perform backups at the centralized site and get
out of the backup handling and backup windows at the remote site(s).
Good DR is actually easier and more afordable that you may think.


On Dec 27 2007, 6:48*pm, JimK wrote:
Any of the virtualization engines- Natapp gateways, IBM SVC,Falconstor,
etc., will aggregate heterogeneous back end disk arrays and do remote
mirroring of one sort or another. *The LUNS are distributed to the
virtualization engines and by them to the servers.

You use the example of Snapmirror, which is a Netapp feature, so I
assume they have Netapp devices- probably filers with their own disk.
The gateways use existing backend disk. *If they have Netapp boxes at
primary and secondary sites, they CAN use Snapmirror with heterogeneous
backend disk.



lahuman9 wrote:
On Dec 24, 4:22 pm, S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.


So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.


The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.


One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.


So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.


S


At first glance it without homogeneous storage and expecting it to be
cheap
I would say 'no way'.


But,


Depends on what you mean by "near-real-time" and how many files.
rsync
has a delay while it checks for which files have changed, but if your
software
can tell which files to transfer why not just use scp? *Assuming you'd
set up
rsync to use scp to copy the file, you'd get rid of the overhead of
rsync.


I would not say that scp is near-real-time...- Hide quoted text -


- Show quoted text -


  #7  
Old January 3rd 08, 01:53 AM posted to comp.arch.storage
S[_4_]
external usenet poster
 
Posts: 12
Default Network mirroring

Wow neat!

Can you use Falconstor to mirror data between 2 netapps or say,
between a netapp and a linux box?

I'd be very curious to know how this works.

Thanks.
S

On Jan 2, 6:30 am, wrote:
JimK is correct. I work with FalconStor and it is possible to setup
synchronous or asynchronous remote mirrors between sites using
disparate hardware. The mirroring functions are done as a software
service through a gateway server/appliance and the back end hardware
has little or no bearing on the functionality. I can't comment on how
NetApps manages to keep the data in sync but can regarding FalconStor
for anyone interested. Once the first sync is done anything after that
is only sync'ing deltas based on changed sectors (not blocks). It is
very bandwidth efficient. There is a cache area defined so that
applications do not see the lag between sites and yet the sites will
remain in sync. In the event of a complete communications loss the
mirror is suspended and then once reestablished will perform a
comparison and fix the deltas not re-do the whole mirror. For complete
data integrity there are also application aware snapshot agents that
will properly quiesce databases for good restore points. Once
established you could perform backups at the centralized site and get
out of the backup handling and backup windows at the remote site(s).
Good DR is actually easier and more afordable that you may think.

On Dec 27 2007, 6:48 pm, JimK wrote:

Any of the virtualization engines- Natapp gateways, IBM SVC,Falconstor,
etc., will aggregate heterogeneous back end disk arrays and do remote
mirroring of one sort or another. The LUNS are distributed to the
virtualization engines and by them to the servers.


You use the example of Snapmirror, which is a Netapp feature, so I
assume they have Netapp devices- probably filers with their own disk.
The gateways use existing backend disk. If they have Netapp boxes at
primary and secondary sites, they CAN use Snapmirror with heterogeneous
backend disk.


lahuman9 wrote:
On Dec 24, 4:22 pm, S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.


So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.


The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.


One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.


So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.


S


At first glance it without homogeneous storage and expecting it to be
cheap
I would say 'no way'.


But,


Depends on what you mean by "near-real-time" and how many files.
rsync
has a delay while it checks for which files have changed, but if your
software
can tell which files to transfer why not just use scp? Assuming you'd
set up
rsync to use scp to copy the file, you'd get rid of the overhead of
rsync.


I would not say that scp is near-real-time...- Hide quoted text -


- Show quoted text -


  #8  
Old January 3rd 08, 05:58 AM posted to comp.arch.storage
Cydrome Leader
external usenet poster
 
Posts: 113
Default Network mirroring

S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.


Instead of inventing a CDN, just use a real one.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.


updates of what? static content or something trapped in a database?

3. It can't cost an arm and a leg.


Good luck with that part. It won't happen.

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.


Or just have the database in one location. Why do you need to split the DB
up all over the place?

So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.


You can't rsync a live database. No matter what any hardware gadget vendor
tries to tell you about data replication, it doesn't work that way for
databases.

Just lookup the problems people have with database clusters with nodes
just feet away from each other. Now add latency and drop the connection
between those machines every now and then and see how things work out.
This applies to big boy databases like oracle, not just toys like
mysql.
  #9  
Old January 4th 08, 04:23 PM posted to comp.arch.storage
[email protected]
external usenet poster
 
Posts: 2
Default Network mirroring

There may be ways of doing this. FalconStor needs to see iSCSI or FC
(or IB) storage behind it but can act as a storage router serving out
over any storage protocol including acting as a NAS. Even without
FalconStor if the Netapps box can provide a disk to the Linux box you
could use the LVM tools provided to mirror a LUN locally. If you need
distant replication you would need some go between to efficiently
handle the communications. In that case you could use LVM to a local
IPStor appliance, replicate to a remote appliance that may point to
the Netapps if the right protocols are available. (Note: not LVMs are
created equally and some may not be able to do this).

There is also a pretty cool tool called FileSafe that would allow
periodic copying of a file/folder but you may need to have your own
open file manager tool as it does not include one for Linux (yet).


On Jan 2, 8:53*pm, S wrote:
Wow neat!

Can you useFalconstorto mirror data between 2 netapps or say,
between a netapp and a linux box?

I'd be very curious to know how this works.

Thanks.
S

On Jan 2, 6:30 am, wrote:



JimK is correct. I work withFalconStorand it is possible to setup
synchronous or asynchronous remote mirrors between sites using
disparate hardware. The mirroring functions are done as a software
service through a gateway server/appliance and the back end hardware
has little or no bearing on the functionality. I can't comment on how
NetApps manages to keep the data in sync but can regardingFalconStor
for anyone interested. Once the first sync is done anything after that
is only sync'ing deltas based on changed sectors (not blocks). It is
very bandwidth efficient. There is a cache area defined so that
applications do not see the lag between sites and yet the sites will
remain in sync. In the event of a complete communications loss the
mirror is suspended and then once reestablished will perform a
comparison and fix the deltas not re-do the whole mirror. For complete
data integrity there are also application aware snapshot agents that
will properly quiesce databases for good restore points. Once
established you could perform backups at the centralized site and get
out of the backup handling and backup windows at the remote site(s).
Good DR is actually easier and more afordable that you may think.


On Dec 27 2007, 6:48 pm, JimK wrote:


Any of the virtualization engines- Natapp gateways, IBM SVC,Falconstor,
etc., will aggregate heterogeneous back end disk arrays and do remote
mirroring of one sort or another. *The LUNS are distributed to the
virtualization engines and by them to the servers.


You use the example of Snapmirror, which is a Netapp feature, so I
assume they have Netapp devices- probably filers with their own disk.
The gateways use existing backend disk. *If they have Netapp boxes at
primary and secondary sites, they CAN use Snapmirror with heterogeneous
backend disk.


lahuman9 wrote:
On Dec 24, 4:22 pm, S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.


So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request..
2. Redirect that request to a DC thats closest to the client.


The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.

  #10  
Old January 17th 08, 01:58 AM posted to comp.arch.storage
belpatCA
external usenet poster
 
Posts: 9
Default Network mirroring

On Dec 24 2007, 2:22 pm, S wrote:
I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.


I doubt you can truly re-direct traffic to any DC and maintain data
coherency without doing some sort of migration before re-routing
traffic to another DC,
Some appliances (like the YottaYotta box) claim to migrate all the
necessary data at the block level on demand, which would make the task
possible. But I assume those appliances fail the arm and leg test :-)

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.


I know some large financial institutions migrate DBs between their
datacenters, but it involves both a lot of costly equipment, and it's
a very complicated environment.
Hopefully the latest virtualization craze hyping anything related to
vmware (and vmotion in particular) will introduce some more affordable
tools...

GG
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seeking The Mirroring God! Gor Asus Motherboards 5 August 20th 06 02:17 PM
Mirroring over network, backup software recommendations [email protected] Storage (alternative) 3 June 15th 05 09:41 PM
Is there any way to do software mirroring in XP Pro? dg Storage (alternative) 8 April 20th 05 12:17 AM
mirroring across ssa adapters? Mike Storage & Hardrives 0 April 15th 04 04:32 PM
RAID mirroring! leza Homebuilt PC's 3 June 27th 03 08:59 PM


All times are GMT +1. The time now is 01:02 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.