A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Unimpressive performance of large MD raid



 
 
Thread Tools Display Modes
  #11  
Old April 24th 09, 08:16 AM posted to comp.os.linux.development.system,comp.arch.storage
David Brown[_2_]
external usenet poster
 
Posts: 323
Default Unimpressive performance of large MD raid

Bill Todd wrote:
kkkk wrote:

...

Some interesting questions. Perhaps you're getting slammed by people
for your configuration choices (which do not seem unreasonable given the
also-not-unreasonable goals that you say drove them) because they're
embarrassed to admit that they have no idea what the answers to those
questions are (which is too bad, because people like me would find
correct answers to them interesting).

Calypso seems especially ignorant when talking about optimal RAID group
sizes. Perhaps he's confusing RAID-5/6 with RAID-3 - but even then he'd
be wrong, since what you really want with RAID-3 is for the total *data*
content (excluding parity) of a stripe to be a convenient value, meaning
that you tend to favor group sizes like 5 or 9 (not counting any spares
that may be present). And given that you've got both processing power
and probably system/memory bus bandwidth to burn, there's no reason why
a software RAID-6 implementation shouldn't perform fairly competitively
with a hardware one.

That said, it's possible that the Linux system file cache interacts
poorly with md in terms of how it destages data to the array - e.g., if
it hands data to md in chunks that don't correspond to a full stripe set
of data to write out (I'm assuming without looking at the code that md
implements RAID-6 such that it can write out a full group of stripes
without having to read in anything) *and* doesn't tell md that the write
is lazy (allowing md to accumulate data in its own buffers until a
convenient amount has arrived - assuming that they're large enough) then
even sequential writes could get pretty expensive (as you seem to be
seeing). A smart implementation might couple the file cache with md
such that no such copy operation was necessary at all, but that would
tend to complicate the layering interface.

Or it's conceivable that ext3's journaling is messing you up,
particularly if you've got the journal on the RAID-6 LUN. If you don't
need the protection of journaling, try using ext2; if you do need it,
make sure that the journal isn't parity-protected (e.g., mirror it
instead of just adding it to the RAID-6 LUN).


An alternative to consider, especially if you are working mainly with
large files, is xfs rather than ext3. xfs works better with large files
(mainly due to it's support of extents), and has good support for
working with raid (it matches its data and structures with the raid
stripes).

I did spend a few minutes in Google trying to find detailed information
about md's RAID-6 implementation but got nowhere. Perhaps its
developers think that no one who isn't willing to read the code has any
business trying to understand its internals - though that attitude would
be difficult to justify in the current case given that they didn't seem
to do a very good job of providing the performance that one might
reasonably expect from a default set-up.


There is a lot more information about linux raid5 than raid6. I think
that reflects usage. Raid 6 is typically used when you have a larger
number of drives - say, 8 or more. People using such large arrays are
much more likely to be looking for higher-end solutions with strong
support contracts, and are thus more likely to be using something with
high-end hardware raid cards. Raid 5 needs only 3 disks, and is a very
common solution for small servers. If you search around for
configuration how-tos, benchmarks, etc., you'll find relatively few that
have more than 4 disks, and therefore few that use raid 6. There's also
a trend (so I've read) towards raid 10 (whether it be linux raid10, or
standard raid 1 + 0) rather than raid 5/6 because of better recovery.


- bill

  #12  
Old April 24th 09, 10:10 AM posted to comp.os.linux.development.system,comp.arch.storage
kkkk
external usenet poster
 
Posts: 17
Default Unimpressive performance of large MD raid

David Schwartz wrote:
It is the bottleneck, it's just not a CPU bottleneck, it's an I/O
bottleneck.


With an 8x PCI-e bus there should be space for 2 GB/sec transfer...

The problem is simply the number of I/Os the system has to
issue. With a 12 disk RAID 6 array implemented in software, a write of
a single byte (admittedly the worst case) will require 10 reads
followed by 12 writes that cannot be started until all 10 reads
complete. Each of these operations has to be started and completed by
the MD driver.


This is true only for non-sequential write.

In my case the system starts writing 5 seconds after dd is pushing data
out (dirty_writeback_centisecs = 500). At that time there is so much
sequential data to write that it will fill many stripes completely.
  #13  
Old April 24th 09, 10:23 AM posted to comp.os.linux.development.system,comp.arch.storage
kkkk
external usenet poster
 
Posts: 17
Default Unimpressive performance of large MD raid

David Brown wrote:

I did spend a few minutes in Google trying to find detailed
information about md's RAID-6 implementation but got nowhere.

...
There is a lot more information about linux raid5 than raid6.


You mean on *Linux MD* raid5? That could be good. Where?

Raid-6 algorithms are practically equivalent to raid-5, except parity
computation obviously .
  #14  
Old April 24th 09, 10:56 AM posted to comp.os.linux.development.system,comp.arch.storage
kkkk
external usenet poster
 
Posts: 17
Default Unimpressive performance of large MD raid

Bill Todd wrote:
That said, it's possible that the Linux system file cache interacts
poorly with md in terms of how it destages data to the array - e.g., if
it hands data to md in chunks that don't correspond to a full stripe set
of data to write out (I'm assuming without looking at the code that md
implements RAID-6 such that it can write out a full group of stripes
without having to read in anything) *and* doesn't tell md that the write
is lazy (allowing md to accumulate data in its own buffers until a
convenient amount has arrived - assuming that they're large enough) then
even sequential writes could get pretty expensive (as you seem to be
seeing). A smart implementation might couple the file cache with md
such that no such copy operation was necessary at all, but that would
tend to complicate the layering interface.


In my case dd pushes 5 seconds of data before the disks start writing
(dirty_writeback_centiseces = 500). dd stays always at least 5 seconds
ahead of the writes. This should fill all stripes completely causing no
reads. I even tried to raise the dirty_writeback_centisecs with no
measurable performance benefit.

Where is this 5secs of data stored? Is it at the ext3 layer or at the
LVM layer (I doubt this one, also I notice there is no LVM kernel thread
runing) or at the MD layer?

Why do you think dd stays at 100% CPU? (with disks/3ware caches enabled)
Shouldn't that be 0%?

Do you think the CPU is high due to a memory-copy operation? If it was
that, I suppose dd from /dev/zero to /dev/null should go at 200MB/sec
instead it goes at 1.1GB/sec (with 100%CPU occupation indeed, 65% of
which is in kernel mode). That would mean that the number of copies
performed by dd while copying to the ext3-raid is 5 times greater than
that for copying from /dev/zero to /dev/null . Hmmm... a bit difficult
to believe. there must be other stuff performed in the ext3 case so to
hog the CPU. Is the ext3 code running whithin the dd process when dd writes?


Or it's conceivable that ext3's journaling is messing you up,
particularly if you've got the journal on the RAID-6 LUN. If you don't
need the protection of journaling, try using ext2; if you do need it,
make sure that the journal isn't parity-protected (e.g., mirror it
instead of just adding it to the RAID-6 LUN).


I think this overhead should affect the first-writes but not the
rewrites performance for ext3 defaults mount (defaults should be
data=ordered which I think means no journal written for rewrites,
correct?). Am I correct?

Hmm probably not because kjournald had significant CPU occupation. What
is the role of the journal during file overwrites?


I did spend a few minutes in Google trying to find detailed information
about md's RAID-6 implementation but got nowhere. Perhaps its
developers think that no one who isn't willing to read the code has any
business trying to understand its internals - though that attitude would
be difficult to justify in the current case given that they didn't seem
to do a very good job of providing the performance that one might
reasonably expect from a default set-up.


Agreed.

Thanks for your answer
  #17  
Old April 24th 09, 12:02 PM posted to comp.os.linux.development.system,comp.arch.storage
David Schwartz
external usenet poster
 
Posts: 5
Default Unimpressive performance of large MD raid

On Apr 24, 2:10*am, kkkk wrote:
David Schwartz wrote:
It is the bottleneck, it's just not a CPU bottleneck, it's an I/O
bottleneck.


With an 8x PCI-e bus there should be space for 2 GB/sec transfer...


Yeah, I agree with you. It looks like an MD issue. On the bright side,
I heard from a reliable source that:

"Furthermore we trust visible, open, old/tested, linux MD code more
than any embedded RAID code which nobody knows except 3ware. What if
there was a bug in 9650SE code? It was a recent controller when we
bought it, and we would have found out only later, maybe years later
after setting up our array. Also, we were already proficient with
linux MD."

The flipside is, you have an untested configuration and nobody
specific who is obligated to provide you with support. You're probably
ahead of the curve, so you may hit every problem before anyone else
does.

DS

  #18  
Old April 24th 09, 12:41 PM posted to comp.os.linux.development.system,comp.arch.storage
Maxim S. Shatskih[_2_]
external usenet poster
 
Posts: 36
Default Unimpressive performance of large MD raid

NTFS unsafe in case of power loss?

User data is not protected by the journaling.

You missed something, we're not talking about FAT here (which is faster than NTFS)...


Depends on scenario. With 2000 files per directory, things do change - FAT uses linear directories, and NTFS uses B-trees similar to database indices.

--
Maxim S. Shatskih
Windows DDK MVP

http://www.storagecraft.com

  #19  
Old April 24th 09, 01:19 PM posted to comp.os.linux.development.system,comp.arch.storage
David Brown[_2_]
external usenet poster
 
Posts: 323
Default Unimpressive performance of large MD raid

kkkk wrote:
David Brown wrote:

I did spend a few minutes in Google trying to find detailed
information about md's RAID-6 implementation but got nowhere.

...
There is a lot more information about linux raid5 than raid6.


You mean on *Linux MD* raid5? That could be good. Where?


Google for "linux raid 5" - there are a few million hits, most of which
are for software raid (i.e., MD raid). Googling for "linux raid 6" only
gets you a few hundred thousand hits.

Raid-6 algorithms are practically equivalent to raid-5, except parity
computation obviously .


Here is a link that might be useful, if you want to know the details of
Linux raid 6:

http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
  #20  
Old April 24th 09, 03:33 PM posted to comp.os.linux.development.system,comp.arch.storage
kkkk
external usenet poster
 
Posts: 17
Default Unimpressive performance of large MD raid

This guy

http://lists.freebsd.org/pipermail/f...er/005170.html

is doing basically the same as I am doing with software raid done with
ZFS in freebsd (raid-Z2 is basically raid-6) writing and reading 10GB
files. His results are a heck of a lot better than mine with defaults
settings and not very distant from the bare hard disks throughput (he
seems to get about 50MB/sec per non-parity disk).

This tells that software raid is indeed capable of doing good stuff in
theory. Just linux MD + ext3 seems to have some performance problems :-(
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
two HDs in RAID better than one large drive? sillyputty[_2_] Homebuilt PC's 16 November 21st 08 03:24 PM
Slow RAID 1 performance on SATA - can I convert to RAID 0? Coolasblu Storage (alternative) 0 July 30th 06 08:02 AM
NCCH-DR large raid drives adaptabl Asus Motherboards 9 April 19th 06 11:02 AM
Which SATA drives for large RAID 5 array? Eli Storage (alternative) 16 March 26th 05 07:47 PM
Large files on Barracuda IV in RAID Nick Storage (alternative) 9 August 27th 03 06:16 PM


All times are GMT +1. The time now is 10:36 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.