If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
NDMP backups I/O bottleneck?
Hello,
Scheme: 10x NetApp filer, 2x 1Tb volumes ADIC library with 6x LTO-2 drives Brocade Silkworm 3800 16 port FC Switch Veritas Netbackup with shared storage option enabled Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM drives attached. Problem: Volumes on the netapps have a lot of directories and small files. When I start backup (over NDMP), netbackup mounts the tape and starts a NDMP session between NetApp and tape drive. It takes forever to complete that backup. This is a log from the netapp: Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 I'm getting a lot of those messages... This is tape I/O log from the same netapp.. CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 44% 7659 0 0 4381 11951 16155 16 0 0 4 51% 6027 0 0 4357 9726 20571 12323 0 0 4 46% 6137 0 0 3098 11679 20951 10438 0 0 4 78% 5207 0 0 3369 11168 25557 9513 0 1843 4 93% 6318 0 0 4424 9623 24429 12163 0 3270 4 46% 5846 0 0 3410 8087 16164 14505 0 516 4 31% 5225 0 0 1910 7638 11612 6680 0 0 4 31% 5986 0 0 4416 11591 11960 24 0 0 4 30% 5746 0 0 4670 9569 12627 0 0 0 4 34% 7059 0 0 4049 12973 9670 8 0 0 4 As you can see here, I'm getting tape write performance about 563 Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!) Now, top output from the backup server: [root@backup root]# top 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4% 0.1% cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0% 0.0% cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4% 0.0% cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0% 0.7% cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4% 0.0% It's clearly an I/O bottle neck here. NetApp send a NDMP packet with directories, and since the backup is running in the indexed mode (with indexes off I can't restore by file), and netback parses it storing information in the DB on that external storage. While data storage procedure is running - netbackup doesn't send out any comfirmation packets to the netapp. So, how do I get rid of that I/O bottleneck? Thanks! |
#2
|
|||
|
|||
NDMP backups I/O bottleneck?
On 16 Jan 2007 20:08:35 -0800, "Mike" wrote:
Hello, Scheme: 10x NetApp filer, 2x 1Tb volumes ADIC library with 6x LTO-2 drives Brocade Silkworm 3800 16 port FC Switch Veritas Netbackup with shared storage option enabled Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM drives attached. Problem: Volumes on the netapps have a lot of directories and small files. When I start backup (over NDMP), netbackup mounts the tape and starts a NDMP session between NetApp and tape drive. It takes forever to complete that backup. This is a log from the netapp: Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 How long does it take for the backup to complete? And what is the average transfer rate once it gets to phase 4? What I think you're seeing is the same issue every file system has when you try to do a sequential dump of a large number files and/or directories. There is a lot of metadata that the server has to map before it can start sending it full stream to tape. My guess is once you get to phase 4 dump you will see very good tape stream performance. I'm getting a lot of those messages... This is tape I/O log from the same netapp.. CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 44% 7659 0 0 4381 11951 16155 16 0 0 4 51% 6027 0 0 4357 9726 20571 12323 0 0 4 46% 6137 0 0 3098 11679 20951 10438 0 0 4 78% 5207 0 0 3369 11168 25557 9513 0 1843 4 93% 6318 0 0 4424 9623 24429 12163 0 3270 4 46% 5846 0 0 3410 8087 16164 14505 0 516 4 31% 5225 0 0 1910 7638 11612 6680 0 0 4 31% 5986 0 0 4416 11591 11960 24 0 0 4 30% 5746 0 0 4670 9569 12627 0 0 0 4 34% 7059 0 0 4049 12973 9670 8 0 0 4 Capture this information at phase 4. As you can see here, I'm getting tape write performance about 563 Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!) Now, top output from the backup server: [root@backup root]# top 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4% 0.1% cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0% 0.0% cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4% 0.0% cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0% 0.7% cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4% 0.0% All this is expected and normal when dealing with phase 1 through 3 if a dump. It's clearly an I/O bottle neck here. NetApp send a NDMP packet with directories, and since the backup is running in the indexed mode (with indexes off I can't restore by file), and netback parses it storing information in the DB on that external storage. While data storage procedure is running - netbackup doesn't send out any comfirmation packets to the netapp. So, how do I get rid of that I/O bottleneck? Have fewer files and directories per volume. Or use snapmirror to tape. Or skip tape alltogether and go with something like Avamar or Data Domain. This issue has plagued people for decades and is not specific to NetApp and certainly not NDMP. NDMP is simply the command protocol, not a transfer or backup protocol. NDMP is simply keeping a connection open so the filer can tell the NDMP client (backup server) when it's ready to start sending data to tape via standard *nix dump command. Plus all the metadata bits of course. ~F |
#3
|
|||
|
|||
NDMP backups I/O bottleneck?
if your server is running windows, try using datamover to determine how well
the server to storage interface performs. You'll need a demo license to use the advanced dialog features which is probably what you want to do. Please be careful, datamover can talk to logical or physical LUNs. Physical device I/O will destroy your data, please use logical only. download from www.moojit.net "Faeandar" wrote in message ... On 16 Jan 2007 20:08:35 -0800, "Mike" wrote: Hello, Scheme: 10x NetApp filer, 2x 1Tb volumes ADIC library with 6x LTO-2 drives Brocade Silkworm 3800 16 port FC Switch Veritas Netbackup with shared storage option enabled Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM drives attached. Problem: Volumes on the netapps have a lot of directories and small files. When I start backup (over NDMP), netbackup mounts the tape and starts a NDMP session between NetApp and tape drive. It takes forever to complete that backup. This is a log from the netapp: Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 How long does it take for the backup to complete? And what is the average transfer rate once it gets to phase 4? What I think you're seeing is the same issue every file system has when you try to do a sequential dump of a large number files and/or directories. There is a lot of metadata that the server has to map before it can start sending it full stream to tape. My guess is once you get to phase 4 dump you will see very good tape stream performance. I'm getting a lot of those messages... This is tape I/O log from the same netapp.. CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 44% 7659 0 0 4381 11951 16155 16 0 0 4 51% 6027 0 0 4357 9726 20571 12323 0 0 4 46% 6137 0 0 3098 11679 20951 10438 0 0 4 78% 5207 0 0 3369 11168 25557 9513 0 1843 4 93% 6318 0 0 4424 9623 24429 12163 0 3270 4 46% 5846 0 0 3410 8087 16164 14505 0 516 4 31% 5225 0 0 1910 7638 11612 6680 0 0 4 31% 5986 0 0 4416 11591 11960 24 0 0 4 30% 5746 0 0 4670 9569 12627 0 0 0 4 34% 7059 0 0 4049 12973 9670 8 0 0 4 Capture this information at phase 4. As you can see here, I'm getting tape write performance about 563 Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!) Now, top output from the backup server: [root@backup root]# top 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4% 0.1% cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0% 0.0% cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4% 0.0% cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0% 0.7% cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4% 0.0% All this is expected and normal when dealing with phase 1 through 3 if a dump. It's clearly an I/O bottle neck here. NetApp send a NDMP packet with directories, and since the backup is running in the indexed mode (with indexes off I can't restore by file), and netback parses it storing information in the DB on that external storage. While data storage procedure is running - netbackup doesn't send out any comfirmation packets to the netapp. So, how do I get rid of that I/O bottleneck? Have fewer files and directories per volume. Or use snapmirror to tape. Or skip tape alltogether and go with something like Avamar or Data Domain. This issue has plagued people for decades and is not specific to NetApp and certainly not NDMP. NDMP is simply the command protocol, not a transfer or backup protocol. NDMP is simply keeping a connection open so the filer can tell the NDMP client (backup server) when it's ready to start sending data to tape via standard *nix dump command. Plus all the metadata bits of course. ~F |
#4
|
|||
|
|||
NDMP backups I/O bottleneck?
have you tried storage pool in between backup server and tape. I am not sure it works in netbackup. Same issue we have. We have lots of files on almost all netapp volume. average file size is very less. We use Tivoli storage manager so we configured LAN back which I feel is better in case of lots of smaller file size. Backup server first backup files in diskstorage pool and once backup completed then moves from storage pool to tape in offline manner so primary netapp files doesn't remain busy alltime. please check if diskstorage pool posibility exists in netbackup Regards, Raju On Jan 17, 9:08 am, "Mike" wrote: Hello, Scheme: 10x NetApp filer, 2x 1Tb volumes ADIC library with 6x LTO-2 drives Brocade Silkworm 3800 16 port FC Switch Veritas Netbackup with shared storage option enabled Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM drives attached. Problem: Volumes on the netapps have a lot of directories and small files. When I start backup (over NDMP), netbackup mounts the tape and starts a NDMP session between NetApp and tape drive. It takes forever to complete that backup. This is a log from the netapp: Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent Jan 16 22:59:47 EST [ndmpd:106]: Message Header: Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998 Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387 Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0 Jan 16 22:59:47 EST [ndmpd:106]: Method 1796 Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0 Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096 I'm getting a lot of those messages... This is tape I/O log from the same netapp.. CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 44% 7659 0 0 4381 11951 16155 16 0 0 4 51% 6027 0 0 4357 9726 20571 12323 0 0 4 46% 6137 0 0 3098 11679 20951 10438 0 0 4 78% 5207 0 0 3369 11168 25557 9513 0 1843 4 93% 6318 0 0 4424 9623 24429 12163 0 3270 4 46% 5846 0 0 3410 8087 16164 14505 0 516 4 31% 5225 0 0 1910 7638 11612 6680 0 0 4 31% 5986 0 0 4416 11591 11960 24 0 0 4 30% 5746 0 0 4670 9569 12627 0 0 0 4 34% 7059 0 0 4049 12973 9670 8 0 0 4 As you can see here, I'm getting tape write performance about 563 Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!) Now, top output from the backup server: [root@backup root]# top 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4% 0.1% cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0% 0.0% cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4% 0.0% cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0% 0.7% cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4% 0.0% It's clearly an I/O bottle neck here. NetApp send a NDMP packet with directories, and since the backup is running in the indexed mode (with indexes off I can't restore by file), and netback parses it storing information in the DB on that external storage. While data storage procedure is running - netbackup doesn't send out any comfirmation packets to the netapp. So, how do I get rid of that I/O bottleneck? Thanks! |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
NetApp NDMP backup runs over ethernet not FC! | Curtis Preston | Storage & Hardrives | 1 | December 24th 06 10:27 PM |
Restoring NDMP backups | Jono | Storage & Hardrives | 5 | January 23rd 06 06:18 PM |
Upgrade Report [GeekTech: New App Makes Backups Easier - 09/07/2004] | Ablang | General | 1 | December 17th 04 06:14 PM |
Tape Backups are NEVER Reliable - EVER | Ron Reaugh | Storage (alternative) | 33 | July 12th 04 11:20 PM |
Networker/NDMP backup problems | Michael Taylor | Storage & Hardrives | 0 | November 5th 03 04:14 PM |