A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

WAFL writing accross the raid group in a aggregate



 
 
Thread Tools Display Modes
  #1  
Old February 1st 07, 02:25 PM posted to comp.arch.storage
Raju Mahala
external usenet poster
 
Posts: 47
Default WAFL writing accross the raid group in a aggregate

I believe that if a aggregate has more than one raid group then data
writing will be accross all the raid group in horizontal fashion. So
if have bigger aggregate then better throughput due to more spindles.

First any comment on this. Whether I am right or not ?

If I check disk utilizations through statit then sometime I found that
data-transfer command issued per second is not almost equivalent on
the disks accross the raid group in a single aggregate.

For ex. see below :

disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-
chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
0c.48 9 9.29 2.30 1.07 23280 6.82 4.29 2727 0.17
6.75 1548 0.00 .... . 0.00 .... .
0c.32 8 8.74 0.37 1.41 101484 8.21 3.90 2261 0.15
3.78 2706 0.00 .... . 0.00 .... .
/aggr1/plex0/rg0:
0c.17 56 48.68 1.47 1.00 8243 34.74 26.74 1022 12.47
10.61 1558 0.00 .... . 0.00 .... .
0c.49 55 49.92 1.45 1.00 23947 35.97 25.90 938 12.50
10.70 1465 0.00 .... . 0.00 .... .
0c.33 70 90.88 30.82 1.23 21822 40.71 17.17 1828 19.35
9.39 2273 0.00 .... . 0.00 .... .
0c.18 67 84.87 27.87 1.29 20410 38.13 18.40 1693 18.87
9.23 2238 0.00 .... . 0.00 .... .
0c.50 65 85.62 27.42 1.21 21775 38.85 17.42 1700 19.35
9.95 2001 0.00 .... . 0.00 .... .
0c.34 68 86.57 27.34 1.23 22603 39.86 17.34 1833 19.37
9.55 2194 0.00 .... . 0.00 .... .
0c.19 67 84.99 26.83 1.26 21149 39.74 17.68 1761 18.42
9.18 2228 0.00 .... . 0.00 .... .
0c.51 65 83.36 25.87 1.27 20110 39.08 17.81 1637 18.41
9.65 1977 0.00 .... . 0.00 .... .
0c.35 68 85.35 28.77 1.21 23676 38.13 18.46 1741 18.45
9.25 2320 0.00 .... . 0.00 .... .
0c.20 67 84.76 27.39 1.23 22127 38.27 17.88 1735 19.10
9.88 2048 0.00 .... . 0.00 .... .
0c.52 69 84.83 28.35 1.27 22185 37.83 18.25 1798 18.65
9.61 2230 0.00 .... . 0.00 .... .
0c.36 68 85.39 27.73 1.27 21596 38.73 17.91 1814 18.93
9.53 2192 0.00 .... . 0.00 .... .
0c.21 67 86.39 28.37 1.27 22485 38.63 17.56 1812 19.39
9.71 2123 0.00 .... . 0.00 .... .
0c.53 69 87.23 28.89 1.26 22340 39.12 17.78 1884 19.21
9.37 2252 0.00 .... . 0.00 .... .
0c.37 69 86.72 27.67 1.27 21195 39.73 17.72 1842 19.32
9.31 2217 0.00 .... . 0.00 .... .
0c.22 68 85.33 27.39 1.24 21374 38.76 18.08 1801 19.18
9.31 2144 0.00 .... . 0.00 .... .
/aggr1/plex0/rg1:
0c.38 58 54.53 0.00 .... . 37.39 27.59 974 17.14
9.69 1608 0.00 .... . 0.00 .... .
0c.54 59 54.79 0.00 .... . 37.65 27.41 1005 17.14
9.75 1650 0.00 .... . 0.00 .... .
0c.23 72 107.07 28.73 1.23 22749 52.13 14.50 1927 26.20
8.20 2296 0.00 .... . 0.00 .... .
0c.39 73 107.10 28.60 1.28 21650 51.87 14.85 1901 26.64
7.80 2418 0.00 .... . 0.00 .... .
0c.55 74 105.45 28.75 1.27 22783 50.68 15.00 1931 26.03
7.93 2471 0.00 .... . 0.00 .... .
0c.24 72 106.05 27.82 1.27 22016 52.02 14.79 1903 26.21
7.61 2392 0.00 .... . 0.00 .... .
0c.40 74 107.03 29.17 1.22 23488 52.14 14.77 1972 25.72
7.82 2526 0.00 .... . 0.00 .... .
0c.56 71 105.81 28.23 1.23 22033 51.59 14.88 1806 25.98
7.91 2191 0.00 .... . 0.00 .... .
0c.25 71 104.19 27.27 1.25 22330 51.15 15.05 1866 25.76
7.86 2252 0.00 .... . 0.00 .... .
0c.41 72 105.07 28.23 1.20 24299 51.23 14.87 1933 25.61
8.01 2369 0.00 .... . 0.00 .... .
0c.57 73 106.22 27.95 1.24 23069 51.88 14.76 1966 26.38
7.76 2409 0.00 .... . 0.00 .... .
0c.26 72 105.71 27.94 1.24 22384 51.79 14.99 1910 25.98
7.59 2376 0.00 .... . 0.00 .... .
0c.42 74 107.23 28.76 1.20 23742 51.98 14.83 1965 26.49
7.46 2531 0.00 .... . 0.00 .... .
0c.58 74 106.30 28.43 1.24 23027 51.76 14.98 1979 26.11
7.74 2459 0.00 .... . 0.00 .... .
0c.27 72 106.53 28.27 1.22 22733 52.02 14.66 1927 26.25
8.26 2184 0.00 .... . 0.00 .... .
0c.43 73 107.20 28.48 1.19 24864 51.43 14.63 1979 27.29
7.95 2325 0.00 .... . 0.00 .... .


Here rg0 has less data-transfer command issued per second than rg1 but
both rg0 and rg1 are part of a single aggregate called aggr1.

Please comment on it why it is so ?

  #2  
Old February 1st 07, 11:57 PM posted to comp.arch.storage
Faeandar
external usenet poster
 
Posts: 191
Default WAFL writing accross the raid group in a aggregate

On 1 Feb 2007 05:25:49 -0800, "Raju Mahala"
wrote:

I believe that if a aggregate has more than one raid group then data
writing will be accross all the raid group in horizontal fashion. So
if have bigger aggregate then better throughput due to more spindles.

First any comment on this. Whether I am right or not ?

If I check disk utilizations through statit then sometime I found that
data-transfer command issued per second is not almost equivalent on
the disks accross the raid group in a single aggregate.

For ex. see below :

disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-
chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
0c.48 9 9.29 2.30 1.07 23280 6.82 4.29 2727 0.17
6.75 1548 0.00 .... . 0.00 .... .
0c.32 8 8.74 0.37 1.41 101484 8.21 3.90 2261 0.15
3.78 2706 0.00 .... . 0.00 .... .
/aggr1/plex0/rg0:
0c.17 56 48.68 1.47 1.00 8243 34.74 26.74 1022 12.47
10.61 1558 0.00 .... . 0.00 .... .
0c.49 55 49.92 1.45 1.00 23947 35.97 25.90 938 12.50
10.70 1465 0.00 .... . 0.00 .... .
0c.33 70 90.88 30.82 1.23 21822 40.71 17.17 1828 19.35
9.39 2273 0.00 .... . 0.00 .... .
0c.18 67 84.87 27.87 1.29 20410 38.13 18.40 1693 18.87
9.23 2238 0.00 .... . 0.00 .... .
0c.50 65 85.62 27.42 1.21 21775 38.85 17.42 1700 19.35
9.95 2001 0.00 .... . 0.00 .... .
0c.34 68 86.57 27.34 1.23 22603 39.86 17.34 1833 19.37
9.55 2194 0.00 .... . 0.00 .... .
0c.19 67 84.99 26.83 1.26 21149 39.74 17.68 1761 18.42
9.18 2228 0.00 .... . 0.00 .... .
0c.51 65 83.36 25.87 1.27 20110 39.08 17.81 1637 18.41
9.65 1977 0.00 .... . 0.00 .... .
0c.35 68 85.35 28.77 1.21 23676 38.13 18.46 1741 18.45
9.25 2320 0.00 .... . 0.00 .... .
0c.20 67 84.76 27.39 1.23 22127 38.27 17.88 1735 19.10
9.88 2048 0.00 .... . 0.00 .... .
0c.52 69 84.83 28.35 1.27 22185 37.83 18.25 1798 18.65
9.61 2230 0.00 .... . 0.00 .... .
0c.36 68 85.39 27.73 1.27 21596 38.73 17.91 1814 18.93
9.53 2192 0.00 .... . 0.00 .... .
0c.21 67 86.39 28.37 1.27 22485 38.63 17.56 1812 19.39
9.71 2123 0.00 .... . 0.00 .... .
0c.53 69 87.23 28.89 1.26 22340 39.12 17.78 1884 19.21
9.37 2252 0.00 .... . 0.00 .... .
0c.37 69 86.72 27.67 1.27 21195 39.73 17.72 1842 19.32
9.31 2217 0.00 .... . 0.00 .... .
0c.22 68 85.33 27.39 1.24 21374 38.76 18.08 1801 19.18
9.31 2144 0.00 .... . 0.00 .... .
/aggr1/plex0/rg1:
0c.38 58 54.53 0.00 .... . 37.39 27.59 974 17.14
9.69 1608 0.00 .... . 0.00 .... .
0c.54 59 54.79 0.00 .... . 37.65 27.41 1005 17.14
9.75 1650 0.00 .... . 0.00 .... .
0c.23 72 107.07 28.73 1.23 22749 52.13 14.50 1927 26.20
8.20 2296 0.00 .... . 0.00 .... .
0c.39 73 107.10 28.60 1.28 21650 51.87 14.85 1901 26.64
7.80 2418 0.00 .... . 0.00 .... .
0c.55 74 105.45 28.75 1.27 22783 50.68 15.00 1931 26.03
7.93 2471 0.00 .... . 0.00 .... .
0c.24 72 106.05 27.82 1.27 22016 52.02 14.79 1903 26.21
7.61 2392 0.00 .... . 0.00 .... .
0c.40 74 107.03 29.17 1.22 23488 52.14 14.77 1972 25.72
7.82 2526 0.00 .... . 0.00 .... .
0c.56 71 105.81 28.23 1.23 22033 51.59 14.88 1806 25.98
7.91 2191 0.00 .... . 0.00 .... .
0c.25 71 104.19 27.27 1.25 22330 51.15 15.05 1866 25.76
7.86 2252 0.00 .... . 0.00 .... .
0c.41 72 105.07 28.23 1.20 24299 51.23 14.87 1933 25.61
8.01 2369 0.00 .... . 0.00 .... .
0c.57 73 106.22 27.95 1.24 23069 51.88 14.76 1966 26.38
7.76 2409 0.00 .... . 0.00 .... .
0c.26 72 105.71 27.94 1.24 22384 51.79 14.99 1910 25.98
7.59 2376 0.00 .... . 0.00 .... .
0c.42 74 107.23 28.76 1.20 23742 51.98 14.83 1965 26.49
7.46 2531 0.00 .... . 0.00 .... .
0c.58 74 106.30 28.43 1.24 23027 51.76 14.98 1979 26.11
7.74 2459 0.00 .... . 0.00 .... .
0c.27 72 106.53 28.27 1.22 22733 52.02 14.66 1927 26.25
8.26 2184 0.00 .... . 0.00 .... .
0c.43 73 107.20 28.48 1.19 24864 51.43 14.63 1979 27.29
7.95 2325 0.00 .... . 0.00 .... .


Here rg0 has less data-transfer command issued per second than rg1 but
both rg0 and rg1 are part of a single aggregate called aggr1.

Please comment on it why it is so ?



Well, looking at the utilization of your parity drives I would say
these are all reads. In which case the placement of the original data
is what matters most. If the data the reads are requesting are
primarily on one raid group then that raid group is going to do more
work.
I'm no expert on statit but to me it still looks like things are
balanced pretty evenly.

you are correct that written data gets striped across all raid groups
more or less evenly. I say more or less because I do not know what
the alogorithm is to determine where a new write picks up if the last
write did not span one entire raid group. but generally they are
striped across all rg's in the aggregate.

There is a diminishing return on aggregate performance and drive
count. I believe the max was 50 or so drives. After that you get no
added performance benefit from another drive added to the aggregate.
This test was done by NetApp and using their write algorithims and
striping methods. It may not be true for other vendors.

~F
  #3  
Old February 2nd 07, 05:59 PM posted to comp.arch.storage
Raju Mahala
external usenet poster
 
Posts: 47
Default WAFL writing accross the raid group in a aggregate

On Feb 2, 3:57 am, Faeandar wrote:
On 1 Feb 2007 05:25:49 -0800, "Raju Mahala"
wrote:





I believe that if a aggregate has more than one raid group then data
writing will be accross all the raid group in horizontal fashion. So
if have bigger aggregate then better throughput due to more spindles.


First any comment on this. Whether I am right or not ?


If I check disk utilizations through statit then sometime I found that
data-transfer command issued per second is not almost equivalent on
the disks accross the raid group in a single aggregate.


For ex. see below :


disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-
chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
0c.48 9 9.29 2.30 1.07 23280 6.82 4.29 2727 0.17
6.75 1548 0.00 .... . 0.00 .... .
0c.32 8 8.74 0.37 1.41 101484 8.21 3.90 2261 0.15
3.78 2706 0.00 .... . 0.00 .... .
/aggr1/plex0/rg0:
0c.17 56 48.68 1.47 1.00 8243 34.74 26.74 1022 12.47
10.61 1558 0.00 .... . 0.00 .... .
0c.49 55 49.92 1.45 1.00 23947 35.97 25.90 938 12.50
10.70 1465 0.00 .... . 0.00 .... .
0c.33 70 90.88 30.82 1.23 21822 40.71 17.17 1828 19.35
9.39 2273 0.00 .... . 0.00 .... .
0c.18 67 84.87 27.87 1.29 20410 38.13 18.40 1693 18.87
9.23 2238 0.00 .... . 0.00 .... .
0c.50 65 85.62 27.42 1.21 21775 38.85 17.42 1700 19.35
9.95 2001 0.00 .... . 0.00 .... .
0c.34 68 86.57 27.34 1.23 22603 39.86 17.34 1833 19.37
9.55 2194 0.00 .... . 0.00 .... .
0c.19 67 84.99 26.83 1.26 21149 39.74 17.68 1761 18.42
9.18 2228 0.00 .... . 0.00 .... .
0c.51 65 83.36 25.87 1.27 20110 39.08 17.81 1637 18.41
9.65 1977 0.00 .... . 0.00 .... .
0c.35 68 85.35 28.77 1.21 23676 38.13 18.46 1741 18.45
9.25 2320 0.00 .... . 0.00 .... .
0c.20 67 84.76 27.39 1.23 22127 38.27 17.88 1735 19.10
9.88 2048 0.00 .... . 0.00 .... .
0c.52 69 84.83 28.35 1.27 22185 37.83 18.25 1798 18.65
9.61 2230 0.00 .... . 0.00 .... .
0c.36 68 85.39 27.73 1.27 21596 38.73 17.91 1814 18.93
9.53 2192 0.00 .... . 0.00 .... .
0c.21 67 86.39 28.37 1.27 22485 38.63 17.56 1812 19.39
9.71 2123 0.00 .... . 0.00 .... .
0c.53 69 87.23 28.89 1.26 22340 39.12 17.78 1884 19.21
9.37 2252 0.00 .... . 0.00 .... .
0c.37 69 86.72 27.67 1.27 21195 39.73 17.72 1842 19.32
9.31 2217 0.00 .... . 0.00 .... .
0c.22 68 85.33 27.39 1.24 21374 38.76 18.08 1801 19.18
9.31 2144 0.00 .... . 0.00 .... .
/aggr1/plex0/rg1:
0c.38 58 54.53 0.00 .... . 37.39 27.59 974 17.14
9.69 1608 0.00 .... . 0.00 .... .
0c.54 59 54.79 0.00 .... . 37.65 27.41 1005 17.14
9.75 1650 0.00 .... . 0.00 .... .
0c.23 72 107.07 28.73 1.23 22749 52.13 14.50 1927 26.20
8.20 2296 0.00 .... . 0.00 .... .
0c.39 73 107.10 28.60 1.28 21650 51.87 14.85 1901 26.64
7.80 2418 0.00 .... . 0.00 .... .
0c.55 74 105.45 28.75 1.27 22783 50.68 15.00 1931 26.03
7.93 2471 0.00 .... . 0.00 .... .
0c.24 72 106.05 27.82 1.27 22016 52.02 14.79 1903 26.21
7.61 2392 0.00 .... . 0.00 .... .
0c.40 74 107.03 29.17 1.22 23488 52.14 14.77 1972 25.72
7.82 2526 0.00 .... . 0.00 .... .
0c.56 71 105.81 28.23 1.23 22033 51.59 14.88 1806 25.98
7.91 2191 0.00 .... . 0.00 .... .
0c.25 71 104.19 27.27 1.25 22330 51.15 15.05 1866 25.76
7.86 2252 0.00 .... . 0.00 .... .
0c.41 72 105.07 28.23 1.20 24299 51.23 14.87 1933 25.61
8.01 2369 0.00 .... . 0.00 .... .
0c.57 73 106.22 27.95 1.24 23069 51.88 14.76 1966 26.38
7.76 2409 0.00 .... . 0.00 .... .
0c.26 72 105.71 27.94 1.24 22384 51.79 14.99 1910 25.98
7.59 2376 0.00 .... . 0.00 .... .
0c.42 74 107.23 28.76 1.20 23742 51.98 14.83 1965 26.49
7.46 2531 0.00 .... . 0.00 .... .
0c.58 74 106.30 28.43 1.24 23027 51.76 14.98 1979 26.11
7.74 2459 0.00 .... . 0.00 .... .
0c.27 72 106.53 28.27 1.22 22733 52.02 14.66 1927 26.25
8.26 2184 0.00 .... . 0.00 .... .
0c.43 73 107.20 28.48 1.19 24864 51.43 14.63 1979 27.29
7.95 2325 0.00 .... . 0.00 .... .


Here rg0 has less data-transfer command issued per second than rg1 but
both rg0 and rg1 are part of a single aggregate called aggr1.


Please comment on it why it is so ?


Well, looking at the utilization of your parity drives I would say
these are all reads. In which case the placement of the original data
is what matters most. If the data the reads are requesting are
primarily on one raid group then that raid group is going to do more
work.
I'm no expert on statit but to me it still looks like things are
balanced pretty evenly.

you are correct that written data gets striped across all raid groups
more or less evenly. I say more or less because I do not know what
the alogorithm is to determine where a new write picks up if the last
write did not span one entire raid group. but generally they are
striped across all rg's in the aggregate.

There is a diminishing return on aggregate performance and drive
count. I believe the max was 50 or so drives. After that you get no
added performance benefit from another drive added to the aggregate.
This test was done by NetApp and using their write algorithims and
striping methods. It may not be true for other vendors.

~F- Hide quoted text -

- Show quoted text -


Thanks Faendar, nice comment in details which gave me new direction in
debugging and configuration.
Can you suggest any commands for debugging slow performance and back-
back to CP. I normally use sysstat, "qtree stats", and statit.
Thanks once again

  #4  
Old February 2nd 07, 08:15 PM posted to comp.arch.storage
Faeandar
external usenet poster
 
Posts: 191
Default WAFL writing accross the raid group in a aggregate

On 2 Feb 2007 08:59:43 -0800, "Raju Mahala"
wrote:

On Feb 2, 3:57 am, Faeandar wrote:
On 1 Feb 2007 05:25:49 -0800, "Raju Mahala"
wrote:





I believe that if a aggregate has more than one raid group then data
writing will be accross all the raid group in horizontal fashion. So
if have bigger aggregate then better throughput due to more spindles.


First any comment on this. Whether I am right or not ?


If I check disk utilizations through statit then sometime I found that
data-transfer command issued per second is not almost equivalent on
the disks accross the raid group in a single aggregate.


For ex. see below :


disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-
chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
0c.48 9 9.29 2.30 1.07 23280 6.82 4.29 2727 0.17
6.75 1548 0.00 .... . 0.00 .... .
0c.32 8 8.74 0.37 1.41 101484 8.21 3.90 2261 0.15
3.78 2706 0.00 .... . 0.00 .... .
/aggr1/plex0/rg0:
0c.17 56 48.68 1.47 1.00 8243 34.74 26.74 1022 12.47
10.61 1558 0.00 .... . 0.00 .... .
0c.49 55 49.92 1.45 1.00 23947 35.97 25.90 938 12.50
10.70 1465 0.00 .... . 0.00 .... .
0c.33 70 90.88 30.82 1.23 21822 40.71 17.17 1828 19.35
9.39 2273 0.00 .... . 0.00 .... .
0c.18 67 84.87 27.87 1.29 20410 38.13 18.40 1693 18.87
9.23 2238 0.00 .... . 0.00 .... .
0c.50 65 85.62 27.42 1.21 21775 38.85 17.42 1700 19.35
9.95 2001 0.00 .... . 0.00 .... .
0c.34 68 86.57 27.34 1.23 22603 39.86 17.34 1833 19.37
9.55 2194 0.00 .... . 0.00 .... .
0c.19 67 84.99 26.83 1.26 21149 39.74 17.68 1761 18.42
9.18 2228 0.00 .... . 0.00 .... .
0c.51 65 83.36 25.87 1.27 20110 39.08 17.81 1637 18.41
9.65 1977 0.00 .... . 0.00 .... .
0c.35 68 85.35 28.77 1.21 23676 38.13 18.46 1741 18.45
9.25 2320 0.00 .... . 0.00 .... .
0c.20 67 84.76 27.39 1.23 22127 38.27 17.88 1735 19.10
9.88 2048 0.00 .... . 0.00 .... .
0c.52 69 84.83 28.35 1.27 22185 37.83 18.25 1798 18.65
9.61 2230 0.00 .... . 0.00 .... .
0c.36 68 85.39 27.73 1.27 21596 38.73 17.91 1814 18.93
9.53 2192 0.00 .... . 0.00 .... .
0c.21 67 86.39 28.37 1.27 22485 38.63 17.56 1812 19.39
9.71 2123 0.00 .... . 0.00 .... .
0c.53 69 87.23 28.89 1.26 22340 39.12 17.78 1884 19.21
9.37 2252 0.00 .... . 0.00 .... .
0c.37 69 86.72 27.67 1.27 21195 39.73 17.72 1842 19.32
9.31 2217 0.00 .... . 0.00 .... .
0c.22 68 85.33 27.39 1.24 21374 38.76 18.08 1801 19.18
9.31 2144 0.00 .... . 0.00 .... .
/aggr1/plex0/rg1:
0c.38 58 54.53 0.00 .... . 37.39 27.59 974 17.14
9.69 1608 0.00 .... . 0.00 .... .
0c.54 59 54.79 0.00 .... . 37.65 27.41 1005 17.14
9.75 1650 0.00 .... . 0.00 .... .
0c.23 72 107.07 28.73 1.23 22749 52.13 14.50 1927 26.20
8.20 2296 0.00 .... . 0.00 .... .
0c.39 73 107.10 28.60 1.28 21650 51.87 14.85 1901 26.64
7.80 2418 0.00 .... . 0.00 .... .
0c.55 74 105.45 28.75 1.27 22783 50.68 15.00 1931 26.03
7.93 2471 0.00 .... . 0.00 .... .
0c.24 72 106.05 27.82 1.27 22016 52.02 14.79 1903 26.21
7.61 2392 0.00 .... . 0.00 .... .
0c.40 74 107.03 29.17 1.22 23488 52.14 14.77 1972 25.72
7.82 2526 0.00 .... . 0.00 .... .
0c.56 71 105.81 28.23 1.23 22033 51.59 14.88 1806 25.98
7.91 2191 0.00 .... . 0.00 .... .
0c.25 71 104.19 27.27 1.25 22330 51.15 15.05 1866 25.76
7.86 2252 0.00 .... . 0.00 .... .
0c.41 72 105.07 28.23 1.20 24299 51.23 14.87 1933 25.61
8.01 2369 0.00 .... . 0.00 .... .
0c.57 73 106.22 27.95 1.24 23069 51.88 14.76 1966 26.38
7.76 2409 0.00 .... . 0.00 .... .
0c.26 72 105.71 27.94 1.24 22384 51.79 14.99 1910 25.98
7.59 2376 0.00 .... . 0.00 .... .
0c.42 74 107.23 28.76 1.20 23742 51.98 14.83 1965 26.49
7.46 2531 0.00 .... . 0.00 .... .
0c.58 74 106.30 28.43 1.24 23027 51.76 14.98 1979 26.11
7.74 2459 0.00 .... . 0.00 .... .
0c.27 72 106.53 28.27 1.22 22733 52.02 14.66 1927 26.25
8.26 2184 0.00 .... . 0.00 .... .
0c.43 73 107.20 28.48 1.19 24864 51.43 14.63 1979 27.29
7.95 2325 0.00 .... . 0.00 .... .


Here rg0 has less data-transfer command issued per second than rg1 but
both rg0 and rg1 are part of a single aggregate called aggr1.


Please comment on it why it is so ?


Well, looking at the utilization of your parity drives I would say
these are all reads. In which case the placement of the original data
is what matters most. If the data the reads are requesting are
primarily on one raid group then that raid group is going to do more
work.
I'm no expert on statit but to me it still looks like things are
balanced pretty evenly.

you are correct that written data gets striped across all raid groups
more or less evenly. I say more or less because I do not know what
the alogorithm is to determine where a new write picks up if the last
write did not span one entire raid group. but generally they are
striped across all rg's in the aggregate.

There is a diminishing return on aggregate performance and drive
count. I believe the max was 50 or so drives. After that you get no
added performance benefit from another drive added to the aggregate.
This test was done by NetApp and using their write algorithims and
striping methods. It may not be true for other vendors.

~F- Hide quoted text -

- Show quoted text -


Thanks Faendar, nice comment in details which gave me new direction in
debugging and configuration.
Can you suggest any commands for debugging slow performance and back-
back to CP. I normally use sysstat, "qtree stats", and statit.
Thanks once again



Back to back CP's indicate write intensive loads, which this statit
doesn;t indicate. I think we're running down different paths here.

Most filer models handle reads well. Nice wide stripes to read from
and a fair amount of cache to hold it. If you are blowing through
that cache frequently then your reads are either a) huge and varying
or b) unique per client.

Back to back CP's are caused when NVRAM is filled and needs to be
flushed to disk. by default it is flushed every 10 seconds, or when
full. Back to back means it's filling as fast as it flushes, or
faster.

Sysstat, qtree stats, and statit are all very good tools to debug
issues. Add to that cifs top or nfsstats -l and you have pretty much
the complete toolbox available to a filer. cifs top and nfsstats -l
will give you the top ops client, from there you can determine why the
client is doing so many ops, and see if that's just how it is or if
there's a problem.
You have to enable options for this to work though; per client stats
for cifs and/or nfs.

~F
  #5  
Old February 3rd 07, 04:47 AM posted to comp.arch.storage
Raju Mahala
external usenet poster
 
Posts: 47
Default WAFL writing accross the raid group in a aggregate

On Feb 3, 12:15 am, Faeandar wrote:
On 2 Feb 2007 08:59:43 -0800, "Raju Mahala"
wrote:





On Feb 2, 3:57 am, Faeandar wrote:
On 1 Feb 2007 05:25:49 -0800, "Raju Mahala"
wrote:


I believe that if a aggregate has more than one raid group then data
writing will be accross all the raid group in horizontal fashion. So
if have bigger aggregate then better throughput due to more spindles.


First any comment on this. Whether I am right or not ?


If I check disk utilizations through statit then sometime I found that
data-transfer command issued per second is not almost equivalent on
the disks accross the raid group in a single aggregate.


For ex. see below :


disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-
chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
0c.48 9 9.29 2.30 1.07 23280 6.82 4.29 2727 0.17
6.75 1548 0.00 .... . 0.00 .... .
0c.32 8 8.74 0.37 1.41 101484 8.21 3.90 2261 0.15
3.78 2706 0.00 .... . 0.00 .... .
/aggr1/plex0/rg0:
0c.17 56 48.68 1.47 1.00 8243 34.74 26.74 1022 12.47
10.61 1558 0.00 .... . 0.00 .... .
0c.49 55 49.92 1.45 1.00 23947 35.97 25.90 938 12.50
10.70 1465 0.00 .... . 0.00 .... .
0c.33 70 90.88 30.82 1.23 21822 40.71 17.17 1828 19.35
9.39 2273 0.00 .... . 0.00 .... .
0c.18 67 84.87 27.87 1.29 20410 38.13 18.40 1693 18.87
9.23 2238 0.00 .... . 0.00 .... .
0c.50 65 85.62 27.42 1.21 21775 38.85 17.42 1700 19.35
9.95 2001 0.00 .... . 0.00 .... .
0c.34 68 86.57 27.34 1.23 22603 39.86 17.34 1833 19.37
9.55 2194 0.00 .... . 0.00 .... .
0c.19 67 84.99 26.83 1.26 21149 39.74 17.68 1761 18.42
9.18 2228 0.00 .... . 0.00 .... .
0c.51 65 83.36 25.87 1.27 20110 39.08 17.81 1637 18.41
9.65 1977 0.00 .... . 0.00 .... .
0c.35 68 85.35 28.77 1.21 23676 38.13 18.46 1741 18.45
9.25 2320 0.00 .... . 0.00 .... .
0c.20 67 84.76 27.39 1.23 22127 38.27 17.88 1735 19.10
9.88 2048 0.00 .... . 0.00 .... .
0c.52 69 84.83 28.35 1.27 22185 37.83 18.25 1798 18.65
9.61 2230 0.00 .... . 0.00 .... .
0c.36 68 85.39 27.73 1.27 21596 38.73 17.91 1814 18.93
9.53 2192 0.00 .... . 0.00 .... .
0c.21 67 86.39 28.37 1.27 22485 38.63 17.56 1812 19.39
9.71 2123 0.00 .... . 0.00 .... .
0c.53 69 87.23 28.89 1.26 22340 39.12 17.78 1884 19.21
9.37 2252 0.00 .... . 0.00 .... .
0c.37 69 86.72 27.67 1.27 21195 39.73 17.72 1842 19.32
9.31 2217 0.00 .... . 0.00 .... .
0c.22 68 85.33 27.39 1.24 21374 38.76 18.08 1801 19.18
9.31 2144 0.00 .... . 0.00 .... .
/aggr1/plex0/rg1:
0c.38 58 54.53 0.00 .... . 37.39 27.59 974 17.14
9.69 1608 0.00 .... . 0.00 .... .
0c.54 59 54.79 0.00 .... . 37.65 27.41 1005 17.14
9.75 1650 0.00 .... . 0.00 .... .
0c.23 72 107.07 28.73 1.23 22749 52.13 14.50 1927 26.20
8.20 2296 0.00 .... . 0.00 .... .
0c.39 73 107.10 28.60 1.28 21650 51.87 14.85 1901 26.64
7.80 2418 0.00 .... . 0.00 .... .
0c.55 74 105.45 28.75 1.27 22783 50.68 15.00 1931 26.03
7.93 2471 0.00 .... . 0.00 .... .
0c.24 72 106.05 27.82 1.27 22016 52.02 14.79 1903 26.21
7.61 2392 0.00 .... . 0.00 .... .
0c.40 74 107.03 29.17 1.22 23488 52.14 14.77 1972 25.72
7.82 2526 0.00 .... . 0.00 .... .
0c.56 71 105.81 28.23 1.23 22033 51.59 14.88 1806 25.98
7.91 2191 0.00 .... . 0.00 .... .
0c.25 71 104.19 27.27 1.25 22330 51.15 15.05 1866 25.76
7.86 2252 0.00 .... . 0.00 .... .
0c.41 72 105.07 28.23 1.20 24299 51.23 14.87 1933 25.61
8.01 2369 0.00 .... . 0.00 .... .
0c.57 73 106.22 27.95 1.24 23069 51.88 14.76 1966 26.38
7.76 2409 0.00 .... . 0.00 .... .
0c.26 72 105.71 27.94 1.24 22384 51.79 14.99 1910 25.98
7.59 2376 0.00 .... . 0.00 .... .
0c.42 74 107.23 28.76 1.20 23742 51.98 14.83 1965 26.49
7.46 2531 0.00 .... . 0.00 .... .
0c.58 74 106.30 28.43 1.24 23027 51.76 14.98 1979 26.11
7.74 2459 0.00 .... . 0.00 .... .
0c.27 72 106.53 28.27 1.22 22733 52.02 14.66 1927 26.25
8.26 2184 0.00 .... . 0.00 .... .
0c.43 73 107.20 28.48 1.19 24864 51.43 14.63 1979 27.29
7.95 2325 0.00 .... . 0.00 .... .


Here rg0 has less data-transfer command issued per second than rg1 but
both rg0 and rg1 are part of a single aggregate called aggr1.


Please comment on it why it is so ?


Well, looking at the utilization of your parity drives I would say
these are all reads. In which case the placement of the original data
is what matters most. If the data the reads are requesting are
primarily on one raid group then that raid group is going to do more
work.
I'm no expert on statit but to me it still looks like things are
balanced pretty evenly.


you are correct that written data gets striped across all raid groups
more or less evenly. I say more or less because I do not know what
the alogorithm is to determine where a new write picks up if the last
write did not span one entire raid group. but generally they are
striped across all rg's in the aggregate.


There is a diminishing return on aggregate performance and drive
count. I believe the max was 50 or so drives. After that you get no
added performance benefit from another drive added to the aggregate.
This test was done by NetApp and using their write algorithims and
striping methods. It may not be true for other vendors.


~F- Hide quoted text -


- Show quoted text -


Thanks Faendar, nice comment in details which gave me new direction in
debugging and configuration.
Can you suggest any commands for debugging slow performance and back-
back to CP. I normally use sysstat, "qtree stats", and statit.
Thanks once again


Back to back CP's indicate write intensive loads, which this statit
doesn;t indicate. I think we're running down different paths here.

Most filer models handle reads well. Nice wide stripes to read from
and a fair amount of cache to hold it. If you are blowing through
that cache frequently then your reads are either a) huge and varying
or b) unique per client.

Back to back CP's are caused when NVRAM is filled and needs to be
flushed to disk. by default it is flushed every 10 seconds, or when
full. Back to back means it's filling as fast as it flushes, or
faster.

Sysstat, qtree stats, and statit are all very good tools to debug
issues. Add to that cifs top or nfsstats -l and you have pretty much
the complete toolbox available to a filer. cifs top and nfsstats -l
will give you the top ops client, from there you can determine why the
client is doing so many ops, and see if that's just how it is or if
there's a problem.
You have to enable options for this to work though; per client stats
for cifs and/or nfs.

~F- Hide quoted text -

- Show quoted text -


how can I check cache utilization ? CPU utilization becomes very high
as it seen in sysstat. I also do check per client stats and provide
you the data.
As it seems in statit O/p, parity disk I/Os are just half of the data
disk so it doesn't means that read and write are almost equal. Also I
found in statit O/p that most of the writes are partial than full
stripes, what it indicates ?
Just curious out of the way, lets take example if only heavy write
happens and back-to-back CP comes in picture. So in that case whether
read will be affected ?


 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Q re raid on GA-K8N-ultra9 - geforce4 old man Gigabyte Motherboards 3 December 23rd 05 02:14 AM
A7N8X series "incomplete RAID set" bug - my experiences and solution Andy C Asus Motherboards 0 July 19th 05 03:06 AM
How Create SATA RAID 1 with current install? Mr Mister Asus Motherboards 8 July 25th 04 10:46 PM
Installing Ati Radeon 9700 drivers to Mandrake Linux 9.2 Meinz General Hardware 2 January 15th 04 07:09 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller Old Dude Gigabyte Motherboards 4 November 12th 03 08:26 PM


All times are GMT +1. The time now is 10:29 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.