NDMP backups I/O bottleneck?

**Mike** · #1 January 17th 07, 04:08 AM posted to comp.arch.storage

Hello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4

As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes off I can't restore by file), and netback parses it storing
information in the DB on that external storage. While data storage
procedure is running - netbackup doesn't send out any comfirmation
packets to the netapp.

So, how do I get rid of that I/O bottleneck?

Thanks!

**Faeandar** · #2 January 18th 07, 02:34 AM posted to comp.arch.storage

On 16 Jan 2007 20:08:35 -0800, "Mike" wrote:

Hello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

How long does it take for the backup to complete? And what is the
average transfer rate once it gets to phase 4?

What I think you're seeing is the same issue every file system has
when you try to do a sequential dump of a large number files and/or
directories. There is a lot of metadata that the server has to map
before it can start sending it full stream to tape.

My guess is once you get to phase 4 dump you will see very good tape
stream performance.

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4

Capture this information at phase 4.

As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

All this is expected and normal when dealing with phase 1 through 3 if
a dump.

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes off I can't restore by file), and netback parses it storing
information in the DB on that external storage. While data storage
procedure is running - netbackup doesn't send out any comfirmation
packets to the netapp.

So, how do I get rid of that I/O bottleneck?

Have fewer files and directories per volume. Or use snapmirror to
tape. Or skip tape alltogether and go with something like Avamar or
Data Domain.

This issue has plagued people for decades and is not specific to
NetApp and certainly not NDMP.

NDMP is simply the command protocol, not a transfer or backup
protocol. NDMP is simply keeping a connection open so the filer can
tell the NDMP client (backup server) when it's ready to start sending
data to tape via standard *nix dump command. Plus all the metadata
bits of course.

~F

**Moojit** · #3 January 25th 07, 01:36 PM posted to comp.arch.storage

if your server is running windows, try using datamover to determine how well
the server to storage interface performs. You'll need a demo license to use
the advanced dialog features which is probably what you want
to do. Please be careful, datamover can talk to logical or physical LUNs.
Physical device I/O will destroy your data, please use logical only.

download from www.moojit.net

"Faeandar" wrote in message
...
On 16 Jan 2007 20:08:35 -0800, "Mike" wrote:

Hello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

How long does it take for the backup to complete? And what is the
average transfer rate once it gets to phase 4?

What I think you're seeing is the same issue every file system has
when you try to do a sequential dump of a large number files and/or
directories. There is a lot of metadata that the server has to map
before it can start sending it full stream to tape.

My guess is once you get to phase 4 dump you will see very good tape
stream performance.

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4

Capture this information at phase 4.

As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

All this is expected and normal when dealing with phase 1 through 3 if
a dump.

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes off I can't restore by file), and netback parses it storing
information in the DB on that external storage. While data storage
procedure is running - netbackup doesn't send out any comfirmation
packets to the netapp.

So, how do I get rid of that I/O bottleneck?

Have fewer files and directories per volume. Or use snapmirror to
tape. Or skip tape alltogether and go with something like Avamar or
Data Domain.

This issue has plagued people for decades and is not specific to
NetApp and certainly not NDMP.

NDMP is simply the command protocol, not a transfer or backup
protocol. NDMP is simply keeping a connection open so the filer can
tell the NDMP client (backup server) when it's ready to start sending
data to tape via standard *nix dump command. Plus all the metadata
bits of course.

~F

**Raju Mahala** · #4 January 25th 07, 05:02 PM posted to comp.arch.storage

have you tried storage pool in between backup server and tape. I am not
sure it works in netbackup. Same issue we have. We have lots of files
on almost all netapp volume. average file size is very less.
We use Tivoli storage manager so we configured LAN back which I feel is
better in case of lots of smaller file size. Backup server first backup
files in diskstorage pool and once backup completed then moves from
storage pool to tape in offline manner so primary netapp files doesn't
remain busy alltime.
please check if diskstorage pool posibility exists in netbackup

Regards,
Raju

On Jan 17, 9:08 am, "Mike" wrote:
Hello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4

As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes off I can't restore by file), and netback parses it storing
information in the DB on that external storage. While data storage
procedure is running - netbackup doesn't send out any comfirmation
packets to the netapp.

So, how do I get rid of that I/O bottleneck?

Thanks!

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
NetApp NDMP backup runs over ethernet not FC!	Curtis Preston	Storage & Hardrives	1	December 24th 06 10:27 PM
Restoring NDMP backups	Jono	Storage & Hardrives	5	January 23rd 06 06:18 PM
Upgrade Report [GeekTech: New App Makes Backups Easier - 09/07/2004]	Ablang	General	1	December 17th 04 06:14 PM
Tape Backups are NEVER Reliable - EVER	Ron Reaugh	Storage (alternative)	33	July 12th 04 11:20 PM
Networker/NDMP backup problems	Michael Taylor	Storage & Hardrives	0	November 5th 03 04:14 PM