If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
fileserver backup: ndmp or network backup?
Hi all,
at the moment our 1.5 TB fileserver holds about 3.5 million files, which are backed up via network. Now I have to plan an upgrade to at least 3TB and consider to invest in an ndmp-capable filer. But is nmdp really faster in this environment ? (if the underlying RAID-systems are equally fast und the network connection between fileserver and backupserver is not a bottleneck?). I have no experience with ndmp up to now and would really appreciate help from some experts. Thanks in advance! Christoph -- Christoph Peus Universität Witten/Herdecke Bereich Informationstechnologie Stockumer Str. 10 58453 Witten, Germany Tel: +49-2302 926212 http://www.uni-wh.de |
#2
|
|||
|
|||
fileserver backup: ndmp or network backup?
On Fri, 08 Dec 2006 15:22:21 +0100, Christoph Peus
wrote: Hi all, at the moment our 1.5 TB fileserver holds about 3.5 million files, which are backed up via network. Now I have to plan an upgrade to at least 3TB and consider to invest in an ndmp-capable filer. But is nmdp really faster in this environment ? (if the underlying RAID-systems are equally fast und the network connection between fileserver and backupserver is not a bottleneck?). I have no experience with ndmp up to now and would really appreciate help from some experts. Thanks in advance! Christoph NDMP is almost always faster, primarily because of the difference in overhead but also because of dump's IO pattern vs. network reads. One of the greatest things about NDMP, imo, is that it takes advantage of any snapshots your system may be capable of performing. This means a guaranteed consistent backup rather than a network backup. In a network backup one part of a file may change right before or right after the read happens, which will give you an inconsistent file on tape. The same holds true for a directory structure or data set. Of course, if your file server cannot take snapshots it won't be any more consistent with NDMP, just faster. NDMP is not a transport protocol, it is a command protocol. What that means is that it relies on a transport protocol, usually IP, to transport data, but the commands used to initiate and control that transport are done so by NDMP. For most unix-based filers NDMP simply calls dump for backup and IP for transport. I can't say what is used for windows based filers but I would imagine a similar setup, maybe Windows Backup? Network Data Management Protocol. The management part is key; remember it is not a transport protocol and you will avoid several pitfalls. ~F |
#3
|
|||
|
|||
fileserver backup: ndmp or network backup?
Faeandar wrote:
NDMP is almost always faster, primarily because of the difference in overhead but also because of dump's IO pattern vs. network reads. One of the greatest things about NDMP, imo, is that it takes advantage of any snapshots your system may be capable of performing. This means a guaranteed consistent backup rather than a network backup. In a network backup one part of a file may change right before or right after the read happens, which will give you an inconsistent file on tape. The same holds true for a directory structure or data set. Of course, if your file server cannot take snapshots it won't be any more consistent with NDMP, just faster. NDMP is not a transport protocol, it is a command protocol. What that means is that it relies on a transport protocol, usually IP, to transport data, but the commands used to initiate and control that transport are done so by NDMP. For most unix-based filers NDMP simply calls dump for backup and IP for transport. I can't say what is used for windows based filers but I would imagine a similar setup, maybe Windows Backup? Network Data Management Protocol. The management part is key; remember it is not a transport protocol and you will avoid several pitfalls. Thanks for your comment. We already use Linux LVM snapshotting for our network backup, so I don't see a special advantage of NMDP here, or did I get something wrong in your explanation? Regarding performance it's most important for me to know wheter the difference is typically "only" 20% or a magnitude. When a network backup does a full backup of a filesystem with millions of files, it typically reads every single file separately which forces a lot of seeks of the involved hds. Idea: If a NDMP-enabled filer would have access to the filesystems snapshot-volume on block-level it could be clever enough to do a full scan of the filesystem, saving information which blocks are in use by which file and do a full backup afterwords by *sequentially* reading the volume on block-level from first to last cylinder, skipping unused blocks, which would need significantly less seeks. Are there systems that work this way? Christoph |
#4
|
|||
|
|||
fileserver backup: ndmp or network backup?
Christoph Peus wrote:
.... When a network backup does a full backup of a filesystem with millions of files, it typically reads every single file separately which forces a lot of seeks of the involved hds. That's true, but in your particular case your average file size appears to be over 400 KB, which means that (unless the files suffer from significant internal fragmentation - which you might find it worthwhile to eliminate for other reasons) a file-by-file backup (with reasonable caching of the directory and inode structure) should achieve about 1/3 of the best possible transfer bandwidth anyway (and won't have to transfer anything but live data, meaning that it could approach half the ideal performance). Idea: If a NDMP-enabled filer would have access to the filesystems snapshot-volume on block-level it could be clever enough to do a full scan of the filesystem, saving information which blocks are in use by which file and do a full backup afterwords by *sequentially* reading the volume on block-level from first to last cylinder, skipping unused blocks, which would need significantly less seeks. That could leave the individual files equally fragmented on the backup medium. A better approach might be for the file system itself to keep its data better-consolidated (not just for backup purposes, but for other common situations where many smallish files within a single directory may be accessed together) - if not at run-time, then via use of a suitably-intelligent defragmenter. - bill |
#5
|
|||
|
|||
fileserver backup: ndmp or network backup?
snapshot-volume on block-level it could be clever enough to do a full
scan of the filesystem, saving information which blocks are in use by which file No need in this, at least in Windows. Windows has FSCTL_GET_VOLUME_BITMAP, so, exclusion of the free blocks from the disk image is trivial and does not require the file/dir tree walk. I hope UNIXen also have the same kind of IOCTL. At least most UNIX filesystems use the free space bitmap scattered across fixed well-known locations on the volume, so, supporting such a call would be trivial. -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation http://www.storagecraft.com |
#6
|
|||
|
|||
fileserver backup: ndmp or network backup?
On Mon, 11 Dec 2006 17:14:16 +0100, Christoph Peus
wrote: Faeandar wrote: NDMP is almost always faster, primarily because of the difference in overhead but also because of dump's IO pattern vs. network reads. One of the greatest things about NDMP, imo, is that it takes advantage of any snapshots your system may be capable of performing. This means a guaranteed consistent backup rather than a network backup. In a network backup one part of a file may change right before or right after the read happens, which will give you an inconsistent file on tape. The same holds true for a directory structure or data set. Of course, if your file server cannot take snapshots it won't be any more consistent with NDMP, just faster. NDMP is not a transport protocol, it is a command protocol. What that means is that it relies on a transport protocol, usually IP, to transport data, but the commands used to initiate and control that transport are done so by NDMP. For most unix-based filers NDMP simply calls dump for backup and IP for transport. I can't say what is used for windows based filers but I would imagine a similar setup, maybe Windows Backup? Network Data Management Protocol. The management part is key; remember it is not a transport protocol and you will avoid several pitfalls. Thanks for your comment. We already use Linux LVM snapshotting for our network backup, so I don't see a special advantage of NMDP here, or did I get something wrong in your explanation? Not sure where Linux is involved. Is that the current file server platform? I'll assume it is. Snapshotting in such a case will get you file consistency but it does not help with performance if you're still doing a network backup. If you were doing a dump the performance would be improved. How much it would improve I can't say. Regarding performance it's most important for me to know wheter the difference is typically "only" 20% or a magnitude. It should not be a magnitude difference. Something under 50% I would guess offhand. When a network backup does a full backup of a filesystem with millions of files, it typically reads every single file separately which forces a lot of seeks of the involved hds. Right, which is why a FS dump or equivalent is faster. No special magic involved just removing the overhead of network traffic and cpu switching. IIRC, dumps do a near sequential disk read of data during it's mapping phase (or at least enough parallel reads to seem so), which would be significantly faster than random reads to service single file requests. Idea: If a NDMP-enabled filer would have access to the filesystems snapshot-volume on block-level it could be clever enough to do a full scan of the filesystem, saving information which blocks are in use by which file and do a full backup afterwords by *sequentially* reading the volume on block-level from first to last cylinder, skipping unused blocks, which would need significantly less seeks. Are there systems that work this way? Christoph Can you give us a list of potential vendors you are looking at? We may be able to give more specifics for each of those. ~F |
#7
|
|||
|
|||
fileserver backup: ndmp or network backup?
One thing you need to check is whether your NDMP backup job is restartable
or not. We had configured NDMP on a 8 TB filesystem, but belatedly realised the filer NDMP did not support restartable jobs, so if anything happened to the job (tape error, drive error, library got rebooted, etc. etc.) the backup job will restart from scratch. Needless to say, it proved to be a big headache. Should have configured a filesystem backup....... "Faeandar" wrote in message ... On Fri, 08 Dec 2006 15:22:21 +0100, Christoph Peus wrote: Hi all, at the moment our 1.5 TB fileserver holds about 3.5 million files, which are backed up via network. Now I have to plan an upgrade to at least 3TB and consider to invest in an ndmp-capable filer. But is nmdp really faster in this environment ? (if the underlying RAID-systems are equally fast und the network connection between fileserver and backupserver is not a bottleneck?). I have no experience with ndmp up to now and would really appreciate help from some experts. Thanks in advance! Christoph NDMP is almost always faster, primarily because of the difference in overhead but also because of dump's IO pattern vs. network reads. One of the greatest things about NDMP, imo, is that it takes advantage of any snapshots your system may be capable of performing. This means a guaranteed consistent backup rather than a network backup. In a network backup one part of a file may change right before or right after the read happens, which will give you an inconsistent file on tape. The same holds true for a directory structure or data set. Of course, if your file server cannot take snapshots it won't be any more consistent with NDMP, just faster. NDMP is not a transport protocol, it is a command protocol. What that means is that it relies on a transport protocol, usually IP, to transport data, but the commands used to initiate and control that transport are done so by NDMP. For most unix-based filers NDMP simply calls dump for backup and IP for transport. I can't say what is used for windows based filers but I would imagine a similar setup, maybe Windows Backup? Network Data Management Protocol. The management part is key; remember it is not a transport protocol and you will avoid several pitfalls. ~F |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How Can I Get the sdk of Toshiba IK-WB11A ?? | eqqmc2 | Webcams | 0 | August 11th 06 05:02 AM |
How Can I Get the sdk of Toshiba IK-WB11A ?? | eqqmc2 | Webcams | 0 | August 11th 06 05:02 AM |
Which backup programs will do this? | AL D | Homebuilt PC's | 13 | December 30th 05 04:15 AM |
How do you backup a small network of computers? | Paul J. Campbell | Storage (alternative) | 22 | December 4th 05 11:51 AM |
Network Backup Storage Device | JimJ | Storage (alternative) | 1 | November 13th 04 10:49 AM |