fileserver backup: ndmp or network backup?

**Christoph Peus** · #1 December 8th 06, 02:22 PM posted to comp.arch.storage

Hi all,

at the moment our 1.5 TB fileserver holds about 3.5 million files, which
are backed up via network. Now I have to plan an upgrade to at least 3TB
and consider to invest in an ndmp-capable filer. But is nmdp really
faster in this environment ? (if the underlying RAID-systems are equally
fast und the network connection between fileserver and backupserver is
not a bottleneck?). I have no experience with ndmp up to now and would
really appreciate help from some experts.
Thanks in advance!

Christoph

--
Christoph Peus
Universität Witten/Herdecke
Bereich Informationstechnologie
Stockumer Str. 10
58453 Witten, Germany
Tel: +49-2302 926212
http://www.uni-wh.de

**Faeandar** · #2 December 10th 06, 12:38 AM posted to comp.arch.storage

On Fri, 08 Dec 2006 15:22:21 +0100, Christoph Peus
wrote:

Hi all,

at the moment our 1.5 TB fileserver holds about 3.5 million files, which
are backed up via network. Now I have to plan an upgrade to at least 3TB
and consider to invest in an ndmp-capable filer. But is nmdp really
faster in this environment ? (if the underlying RAID-systems are equally
fast und the network connection between fileserver and backupserver is
not a bottleneck?). I have no experience with ndmp up to now and would
really appreciate help from some experts.
Thanks in advance!

Christoph

NDMP is almost always faster, primarily because of the difference in
overhead but also because of dump's IO pattern vs. network reads.

One of the greatest things about NDMP, imo, is that it takes advantage
of any snapshots your system may be capable of performing. This means
a guaranteed consistent backup rather than a network backup. In a
network backup one part of a file may change right before or right
after the read happens, which will give you an inconsistent file on
tape. The same holds true for a directory structure or data set.
Of course, if your file server cannot take snapshots it won't be any
more consistent with NDMP, just faster.

NDMP is not a transport protocol, it is a command protocol. What that
means is that it relies on a transport protocol, usually IP, to
transport data, but the commands used to initiate and control that
transport are done so by NDMP.
For most unix-based filers NDMP simply calls dump for backup and IP
for transport. I can't say what is used for windows based filers but
I would imagine a similar setup, maybe Windows Backup?

Network Data Management Protocol. The management part is key;
remember it is not a transport protocol and you will avoid several
pitfalls.

~F

**Christoph Peus** · #3 December 11th 06, 04:14 PM posted to comp.arch.storage

Faeandar wrote:

NDMP is almost always faster, primarily because of the difference in
overhead but also because of dump's IO pattern vs. network reads.

One of the greatest things about NDMP, imo, is that it takes advantage
of any snapshots your system may be capable of performing. This means
a guaranteed consistent backup rather than a network backup. In a
network backup one part of a file may change right before or right
after the read happens, which will give you an inconsistent file on
tape. The same holds true for a directory structure or data set.
Of course, if your file server cannot take snapshots it won't be any
more consistent with NDMP, just faster.

NDMP is not a transport protocol, it is a command protocol. What that
means is that it relies on a transport protocol, usually IP, to
transport data, but the commands used to initiate and control that
transport are done so by NDMP.
For most unix-based filers NDMP simply calls dump for backup and IP
for transport. I can't say what is used for windows based filers but
I would imagine a similar setup, maybe Windows Backup?

Network Data Management Protocol. The management part is key;
remember it is not a transport protocol and you will avoid several
pitfalls.

Thanks for your comment. We already use Linux LVM snapshotting for our
network backup, so I don't see a special advantage of NMDP here, or did
I get something wrong in your explanation?

Regarding performance it's most important for me to know wheter the
difference is typically "only" 20% or a magnitude.
When a network backup does a full backup of a filesystem with millions
of files, it typically reads every single file separately which forces a
lot of seeks of the involved hds.

Idea: If a NDMP-enabled filer would have access to the filesystems
snapshot-volume on block-level it could be clever enough to do a full
scan of the filesystem, saving information which blocks are in use by
which file and do a full backup afterwords by *sequentially* reading the
volume on block-level from first to last cylinder, skipping unused
blocks, which would need significantly less seeks. Are there systems
that work this way?

Christoph

**Bill Todd** · #4 December 11th 06, 05:05 PM posted to comp.arch.storage

Christoph Peus wrote:

....

When a network backup does a full backup of a filesystem with millions
of files, it typically reads every single file separately which forces a
lot of seeks of the involved hds.

That's true, but in your particular case your average file size appears
to be over 400 KB, which means that (unless the files suffer from
significant internal fragmentation - which you might find it worthwhile
to eliminate for other reasons) a file-by-file backup (with reasonable
caching of the directory and inode structure) should achieve about 1/3
of the best possible transfer bandwidth anyway (and won't have to
transfer anything but live data, meaning that it could approach half the
ideal performance).

Idea: If a NDMP-enabled filer would have access to the filesystems
snapshot-volume on block-level it could be clever enough to do a full
scan of the filesystem, saving information which blocks are in use by
which file and do a full backup afterwords by *sequentially* reading the
volume on block-level from first to last cylinder, skipping unused
blocks, which would need significantly less seeks.

That could leave the individual files equally fragmented on the backup
medium. A better approach might be for the file system itself to keep
its data better-consolidated (not just for backup purposes, but for
other common situations where many smallish files within a single
directory may be accessed together) - if not at run-time, then via use
of a suitably-intelligent defragmenter.

- bill

**Maxim S. Shatskih** · #5 December 12th 06, 04:52 AM posted to comp.arch.storage

snapshot-volume on block-level it could be clever enough to do a full
scan of the filesystem, saving information which blocks are in use by
which file

No need in this, at least in Windows. Windows has FSCTL_GET_VOLUME_BITMAP, so,
exclusion of the free blocks from the disk image is trivial and does not
require the file/dir tree walk.

I hope UNIXen also have the same kind of IOCTL. At least most UNIX filesystems
use the free space bitmap scattered across fixed well-known locations on the
volume, so, supporting such a call would be trivial.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation

http://www.storagecraft.com

**Faeandar** · #6 December 13th 06, 02:55 AM posted to comp.arch.storage

On Mon, 11 Dec 2006 17:14:16 +0100, Christoph Peus
wrote:

Faeandar wrote:

NDMP is almost always faster, primarily because of the difference in
overhead but also because of dump's IO pattern vs. network reads.

One of the greatest things about NDMP, imo, is that it takes advantage
of any snapshots your system may be capable of performing. This means
a guaranteed consistent backup rather than a network backup. In a
network backup one part of a file may change right before or right
after the read happens, which will give you an inconsistent file on
tape. The same holds true for a directory structure or data set.
Of course, if your file server cannot take snapshots it won't be any
more consistent with NDMP, just faster.

NDMP is not a transport protocol, it is a command protocol. What that
means is that it relies on a transport protocol, usually IP, to
transport data, but the commands used to initiate and control that
transport are done so by NDMP.
For most unix-based filers NDMP simply calls dump for backup and IP
for transport. I can't say what is used for windows based filers but
I would imagine a similar setup, maybe Windows Backup?

Network Data Management Protocol. The management part is key;
remember it is not a transport protocol and you will avoid several
pitfalls.

Thanks for your comment. We already use Linux LVM snapshotting for our
network backup, so I don't see a special advantage of NMDP here, or did
I get something wrong in your explanation?

Not sure where Linux is involved. Is that the current file server
platform? I'll assume it is.

Snapshotting in such a case will get you file consistency but it does
not help with performance if you're still doing a network backup. If
you were doing a dump the performance would be improved. How much it
would improve I can't say.

Regarding performance it's most important for me to know wheter the
difference is typically "only" 20% or a magnitude.

It should not be a magnitude difference. Something under 50% I would
guess offhand.

When a network backup does a full backup of a filesystem with millions
of files, it typically reads every single file separately which forces a
lot of seeks of the involved hds.

Right, which is why a FS dump or equivalent is faster. No special
magic involved just removing the overhead of network traffic and cpu
switching. IIRC, dumps do a near sequential disk read of data during
it's mapping phase (or at least enough parallel reads to seem so),
which would be significantly faster than random reads to service
single file requests.

Idea: If a NDMP-enabled filer would have access to the filesystems
snapshot-volume on block-level it could be clever enough to do a full
scan of the filesystem, saving information which blocks are in use by
which file and do a full backup afterwords by *sequentially* reading the
volume on block-level from first to last cylinder, skipping unused
blocks, which would need significantly less seeks. Are there systems
that work this way?

Christoph

Can you give us a list of potential vendors you are looking at? We
may be able to give more specifics for each of those.

~F

**Big Al** · #7 January 30th 07, 08:15 AM posted to comp.arch.storage

One thing you need to check is whether your NDMP backup job is restartable
or not.

We had configured NDMP on a 8 TB filesystem, but belatedly realised the
filer NDMP did not support restartable jobs, so if anything happened to the
job (tape error, drive error, library got rebooted, etc. etc.) the backup
job will restart from scratch.

Needless to say, it proved to be a big headache. Should have configured a
filesystem backup.......

"Faeandar" wrote in message
...
On Fri, 08 Dec 2006 15:22:21 +0100, Christoph Peus
wrote:

Hi all,

at the moment our 1.5 TB fileserver holds about 3.5 million files, which
are backed up via network. Now I have to plan an upgrade to at least 3TB
and consider to invest in an ndmp-capable filer. But is nmdp really
faster in this environment ? (if the underlying RAID-systems are equally
fast und the network connection between fileserver and backupserver is
not a bottleneck?). I have no experience with ndmp up to now and would
really appreciate help from some experts.
Thanks in advance!

Christoph

NDMP is almost always faster, primarily because of the difference in
overhead but also because of dump's IO pattern vs. network reads.

One of the greatest things about NDMP, imo, is that it takes advantage
of any snapshots your system may be capable of performing. This means
a guaranteed consistent backup rather than a network backup. In a
network backup one part of a file may change right before or right
after the read happens, which will give you an inconsistent file on
tape. The same holds true for a directory structure or data set.
Of course, if your file server cannot take snapshots it won't be any
more consistent with NDMP, just faster.

NDMP is not a transport protocol, it is a command protocol. What that
means is that it relies on a transport protocol, usually IP, to
transport data, but the commands used to initiate and control that
transport are done so by NDMP.
For most unix-based filers NDMP simply calls dump for backup and IP
for transport. I can't say what is used for windows based filers but
I would imagine a similar setup, maybe Windows Backup?

Network Data Management Protocol. The management part is key;
remember it is not a transport protocol and you will avoid several
pitfalls.

~F

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How Can I Get the sdk of Toshiba IK-WB11A ??	eqqmc2	Webcams	0	August 11th 06 05:02 AM
How Can I Get the sdk of Toshiba IK-WB11A ??	eqqmc2	Webcams	0	August 11th 06 05:02 AM
Which backup programs will do this?	AL D	Homebuilt PC's	13	December 30th 05 04:15 AM
How do you backup a small network of computers?	Paul J. Campbell	Storage (alternative)	22	December 4th 05 11:51 AM
Network Backup Storage Device	JimJ	Storage (alternative)	1	November 13th 04 10:49 AM