A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Increasing disk performance with many small files (NTFS/ Windowsroaming profiles)



 
 
Thread Tools Display Modes
  #11  
Old July 19th 04, 10:45 PM
Ron Reaugh
external usenet poster
 
Posts: n/a
Default


"Benno..." wrote in message
...
I setup a test server to do some performance tests. I collected a
dataset of 26 profiles (216MB in 46.075 files and 1773 directories).
Copying them from the server to a workstation gives an average speed of
420KByte/sec (the test server is newer and has therefor better
performing disks/arraycontroller then the production server in my
previous post.


Try the same experiment twice, once pushing(xcopy on server) the file set
and once pulling(xcopy on workstation) the fileset.

The production server gets around 230KB/sec on the test
dataset).
If I copy this dataset on the server itself from the RAID1 boot/system
partition to the RAID5 data partition I see 2500KByte/sec.




  #12  
Old July 19th 04, 10:47 PM
Ron Reaugh
external usenet poster
 
Posts: n/a
Default


"Folkert Rienstra" wrote in message
...
You choose your stripe size depending on filesize and stripe width.

If you change the stripe width


Could you please cite a reference for you use of the term "stripe width".

without changing the stripe size then small


Could you please cite a reference for you use of the term "stripe size".


  #13  
Old July 20th 04, 07:26 AM
Marc de Vries
external usenet poster
 
Posts: n/a
Default

On Mon, 19 Jul 2004 21:45:09 GMT, "Ron Reaugh"
wrote:


"Benno..." wrote in message
...
I setup a test server to do some performance tests. I collected a
dataset of 26 profiles (216MB in 46.075 files and 1773 directories).
Copying them from the server to a workstation gives an average speed of
420KByte/sec (the test server is newer and has therefor better
performing disks/arraycontroller then the production server in my
previous post.


Try the same experiment twice, once pushing(xcopy on server) the file set
and once pulling(xcopy on workstation) the fileset.


Good idea. But I wonder if the roaming profile copy operations will be
that effective in copying. I expect that it behaves more like a normal
copy.

The production server gets around 230KB/sec on the test
dataset).
If I copy this dataset on the server itself from the RAID1 boot/system
partition to the RAID5 data partition I see 2500KByte/sec.


Seems like the network doesn't like all those small files either. I'm
not sure if or how you can improve the situation there. You'll need
some network guys for that.

Marc
  #14  
Old July 20th 04, 04:25 PM
Alexander Grigoriev
external usenet poster
 
Posts: n/a
Default

If you had XP on workstations, you could use client-side cache (offline
files). If you have security concerns about it, those cached files can be
encrypted on the client.

"Benno..." wrote in message
...
Due to applications as the SAP client and AutoCAD 2002 our users roaming
profiles contain thousands of very small files. I have noticed that the
average transfer rate of those small files (~350Bytes in size) over the
network is extremely slow compared to normal to large sized files (300KB
up to a few MB). With the normal sized files I'm seeing transfer rates
to the workstations of 4MB to 15MB per second, with the small files this
drops to as low as 75KB per second with an average of ~200KB per second.

The roaming profiles are stored on a RAID5 logical drive with a 64KB
stripe size (I think this is the maximum for the Smart Array 5300
controller) and the NTFS partition is formatted with the default 4KB
cluster size. The Array Controller cache is configured 25% read / 75%
write to compensate for the RAID5 slower writes.

The server is a Windows 2000 SP4 machine, the workstations are NT4 SP6a.
The network is 100Mb switched with a 1000Mb connection to the fileserver.

Is there anything I can do with the RAID stripe size or the cluster size
to increase the throughput of those small files without affecting
transfer speed the normal sized files to much?

Are there any benchmark programs that I can use to test this?

Could the TCP/IP Windows size be an issue here?

--
Thanks,
Benno...



  #15  
Old July 20th 04, 07:32 PM
Marc de Vries
external usenet poster
 
Posts: n/a
Default

On Mon, 19 Jul 2004 16:07:23 +0200, "Folkert Rienstra"
wrote:


"Marc de Vries" wrote in message
On Mon, 19 Jul 2004 09:28:50 +0200, "Benno..." wrote:

Benno... wrote:

The roaming profiles are stored on a RAID5 logical drive with a 64KB
stripe size (I think this is the maximum for the Smart Array 5300
controller) and the NTFS partition is formatted with the default 4KB
cluster size. The Array Controller cache is configured 25% read / 75%
write to compensate for the RAID5 slower writes.

I was thinking, the RAID5 drive consists of 6 disk. Normally the more
spindles the better the performance but is this also true with those
very small files? Could a large number of spindles have a negative
performance effect?


More spindles also gives better performnance with those very small
files.


Nope, only on busy servers that do alot of them simultaniously.


If the server is doing nothing he wouldn't have a bought a Smart array
5300 controller for it.

The array controller then has the option to read multiple files
simultaneously from different spindles.


Therefor still reads at full stripe width transfer rates.


Which is not important for those very small files at all.

Especially with small files you should set the stripe size to the
maximum that the controller supports. So it already has the optimum
setting.


Not if this is not a "busy" server. The bigger the stripe size the more
small files that sit on a single disk and transfer at single disk speeds.
If that's not compensated by the shear number of them that a part of
is read simultaniously all the time then you loose.


Wrong. As I have explained you time and again in the past the
transfer rate is not important for small files. Why don't you listen!

When you read thousands of files of 350 bytes size it doesn't matter
if I read them with 30MB/s transfer rate or which 300MB/s transfer
rate. The time to get the file depends solely on the time that is
needed to seek and open the file. That takes 90% of the total time to
get the file. Since the transferrate only takes 10% of the time the
impact of a faster transferrate is neglictable. (rough estimates, it
will be even less for 350 bytes files, but you will remember these
numbers from a few days ago when I explained it to you in detail)

(The idea behind that is that the small files are stored on as few disks
as possible, which will increase the chance that you can read several files
simultaneously)


So you actually make them slower, to read more of them simultaniously.


Almost right.
I actually make the transferring of the small files slower by 0.00001%
because of a lower transferrate and then make the transfer of the
total set of files faster by 400% because I can open multiple files at
once.

On a not so busy server you are insuring that the small files will transfer
even slower compared to doing nothing. Nice one.


On a server that is only opening 1 file at a time. (not a realistic
scnenario) I am slowing down the opening of that file by 0.00001%

So who cares about that? (well you do obviously, but anyone actually
using that server in real life server won't)

When you leave it as it is that you thought was best, at least you don't
make the ones that fill a stripe width slower, and, when they are less
than that, smaller files automatically fill up the gap when the server is
busy and has many outstanding IO.


But whatever you do, you might increase performance, but you will
never ever get large transferrates with small files.


Right, now for yourself to let that sink in.


How about you actually start to listen to what I explain to you and
LEARN from it?

The reason for that is the following: The time it takes to search for
that file on the disk and open it is very large in comparison to the
time it takes to transfer it.

The same might also happen on the network. I'm not sure what overhead
you get on small files in the network. But you might want to take a
look at what happens when you access those small files directly on the
server, compared to what happens when you access them over the
network.

Are you doing a lot of writing to the array controller? You have to
accept that writes will never be fast. Extra cache will not fix that.


Not on a busy server, no. And not if the write speed is not disk related.
It will if it is and the cache can catch up in less busier periods acting
as a buffer.

And it


What "it"?

might slow the the reads down a bit in such a way that on
average the user experience is slower.
I don't have personal experience with roaming profiles, but I'd guess
they require more read than write capacity. More cache in the array
controller might help. if you have the option to add more.

You could experiment with the cluster size, but I don't think that
that will help. The cause of the slow performance is the relatively
large seektime when accessing small files.


And that doesn't change when the clustersize is smaller.


Actually it does when already small files fragment because of it.


How about you start reading the thread again.
We are talking about files that are about 350 bytes in size!

The smallest cluster size that NTFS supports is already much bigger
then that: 512 bytes. So how are these files going to be fragmented by
that clustersize?

I'm afraid that I can't think of much to improve the situation.
Basically the applications shouldn't create so many extremely small
files, because that will always hurt performance.


Unless they sit on a dedicated drive that is not mechanical in nature.
Solid State Disk.


That could be an option. But Even though there are lots of very small
files in the roaming profile, the rest of the roaming profile could be
very big. Windows probably won't let the profile be stored on two
different types of disk.

(Your backup software is probably not too happy about it either)


That obviously depends on the type of backup.


Obviously that is also the reason why I said that the backup is
PROBABLY not too happy about it.
For some backup methods it indeed doesn't matter.

Marc
  #16  
Old July 20th 04, 07:37 PM
Marc de Vries
external usenet poster
 
Posts: n/a
Default

On Mon, 19 Jul 2004 16:20:13 +0200, "Folkert Rienstra"
wrote:

"Benno" wrote in message
Due to applications as the SAP client and AutoCAD 2002 our users roaming
profiles contain thousands of very small files. I have noticed that the
average transfer rate of those small files (~350Bytes in size) over the
network is extremely slow compared to normal to large sized files (300KB
up to a few MB). With the normal sized files I'm seeing transfer rates
to the workstations of 4MB to 15MB per second, with the small files this
drops to as low as 75KB per second with an average of ~200KB per second.


512 bytes (one sector) or 4 kB (one cluster) reside in a single 64kB
stripe so it transfers at single drive speed.

At an STR of 51MB/s this file transfers in .1 ms or .4 ms

With an average access time of 12 ms your average transfer rate is from
(.1/12.1)*51MB/s 420kB/s to 1.65MB/s (.4/12.4)*51MB/s

Your 350 byte file may run at 350/4096*1.65 MB/s = 400KB/s.
(And yes, because of that huge difference in access time and actual trans-
fer time it is trivial whether the disk system reads a sector or a cluster).


So you have finally accepted what I have been telling you for weeks:
that the transferrate of the array is not important for small files
because the access time is soo much bigger. 12 ms vs 0.4 ms.

I glad that you are aparantly capable of listening and learning after
all.


Marc
  #17  
Old July 22nd 04, 12:03 AM
Folkert Rienstra
external usenet poster
 
Posts: n/a
Default

"Marc de Vries" wrote in message
On Mon, 19 Jul 2004 16:20:13 +0200, "Folkert Rienstra" wrote:

"Benno" wrote in message
Due to applications as the SAP client and AutoCAD 2002 our users roaming
profiles contain thousands of very small files. I have noticed that the
average transfer rate of those small files (~350Bytes in size) over the
network is extremely slow compared to normal to large sized files (300KB
up to a few MB). With the normal sized files I'm seeing transfer rates
to the workstations of 4MB to 15MB per second, with the small files this
drops to as low as 75KB per second with an average of ~200KB per second.


512 bytes (one sector) or 4 kB (one cluster) reside in a single 64kB
stripe so it transfers at single drive speed.

At an STR of 51MB/s this file transfers in .1 ms or .4 ms

With an average access time of 12 ms your average transfer rate is from
(.1/12.1)*51MB/s 420kB/s to 1.65MB/s (.4/12.4)*51MB/s

Your 350 byte file may run at 350/4096*1.65 MB/s = 400KB/s.
(And yes, because of that huge difference in access time and actual trans-
fer time it is trivial whether the disk system reads a sector or a cluster).


So you have finally accepted what I have been telling you for weeks:


Which I fully debunked.

that the transferrate


Still doesn't understand transfer rate.

of the array is not important for small files
because the access time is soo much bigger. 12 ms vs 0.4 ms.


Still mighty clueless. Notice that 51 MB/s in the formulas? That is the STR.
Change it and the average tranfer rate changes with it, 1:1.

The same files on an array will transfer n-times (n=stripewidth)
faster than they would on a single drive when accessed simultaniously.

So when an application reads several small files at once it can with
some luck read them at n-times faster.
On a 6 drive R5 array that 4kB files can be read at 5*1.6 = 8 MB/s
compared to 1.6 MB/s when read in serial. That is a 500% improvement!

However the pure transfertime divided by total transfer time relation in
that same formula dramatically worsens the result for a striped small file.
So for a small file just around the size of a stripe width, the performance
is barely any better than run from a single drive. It is this type of small
file that can dramatically improve performance when it is not striped
(upping the stripesize to the file size) and then several of those files may
be read simultaniously.

The effect however wears off quickly when the files are getting smaller.
The biggest effect is near the stripe size transition point and it is gone
completely below the 1/(n-1) stripe size point.

The 350-byte files are definitely below that.


I glad that you are aparantly capable of listening and learning after all.


Which cannot be said of you when I washed your ears recently and you
keep mixing up terms.



Marc


  #18  
Old July 22nd 04, 12:39 AM
Folkert Rienstra
external usenet poster
 
Posts: n/a
Default

"Marc de Vries" wrote in message news
On Mon, 19 Jul 2004 16:07:23 +0200, "Folkert Rienstra" wrote:
"Marc de Vries" wrote in message
On Mon, 19 Jul 2004 09:28:50 +0200, "Benno..." wrote:

Benno... wrote:

The roaming profiles are stored on a RAID5 logical drive with a 64KB
stripe size (I think this is the maximum for the Smart Array 5300
controller) and the NTFS partition is formatted with the default 4KB
cluster size. The Array Controller cache is configured 25% read / 75%
write to compensate for the RAID5 slower writes.

I was thinking, the RAID5 drive consists of 6 disk. Normally the more
spindles the better the performance but is this also true with those
very small files? Could a large number of spindles have a negative
performance effect?

More spindles also gives better performnance with those very small
files.


Nope, only on busy servers that do alot of them simultaniously.


If the server is doing nothing he wouldn't have a bought a Smart array
5300 controller for it.


You never cease to amaze me.
What has that got to do with a "busy server" and doing parallel IO?


The array controller then has the option to read multiple files
simultaneously from different spindles.


Therefor still reads at full stripe width transfer rates.


Which is not important for those very small files at all.


Ofcourse it is. You wouldn't be making the effort if it wasn't so.
As for those 350-byte files it won't make any difference, that I will agree.


Especially with small files you should set the stripe size to the
maximum that the controller supports. So it already has the optimum
setting.


Not if this is not a "busy" server. The bigger the stripe size the more
small files that sit on a single disk and transfer at single disk speeds.
If that's not compensated by the sheer number of them that a part of
is read simultaniously all the time then you loose.


Wrong. As I have explained you time and again in the past the
transfer rate is not important for small files. Why don't you listen!


Because you are are wrong, and I proved it.


When you read thousands of files of 350 bytes size it doesn't matter
if I read them with 30MB/s transfer rate or which 300MB/s transfer
rate.


Even you shouldn't be *that* clueless. That set of files will tranfer 10
times faster when that transfer rate is aggregated by a 10-drive array.

And btw, it doesn't make one jot of difference what stripe size you
choose because those 350-byte files will never sit on more than one
stripe size, whatever you do. Even with a 2kB stripe size you still read
5 of those 350 byte files simultaniously

The time to get the file depends solely on the time that is
needed to seek and open the file.


You miss the whole point.

That takes 90% of the total time to get the file.


So what, it results in a certain (average) transfer rate.
With raid it results in n-times that (average) transfer rate.

Since the transferrate


Still has no clue about transfer rate.

only takes 10% of the time the impact of a faster transferrate is neglictable.


And it's not even a comprehensible sentence.

(rough estimates, it will be even less for 350 bytes files, but you will remember


these numbers from a few days ago when I explained it to you in detail)


You mean, when I explained it to *you*, don't you, troll?
There was a distinct *lack* of detail in *your* post and your constant con-
tradicting yourself which made it so hard to detect where you were wrong.


(The idea behind that is that the small files are stored on as few disks
as possible, which will increase the chance that you can read several files
simultaneously)


So you actually make them slower, to read more of them simultaniously.


Almost right.


Exactly right.

I actually make the transferring of the small files slower by 0.00001%


Actually, it is far more complex than that. Your small files are now 64kB.

because of a lower transferrate


Some 13% lower transferrate for those 64kB files.

and then make the transfer of the total set of files faster
by 400% because I can open multiple files at once.


Right, time for some examples:
64kB files, stripe size 13kB, 5 drives Raid0, 51MB/s per drive. 12ms access time.
64kB file is forced on a full stripewidth. 13kB transfers in 13/51 = .25 ms
at an average single drive transer rate of (.25/12.25)*51MB/s= 1.1MB/s
So total average file transfer by 5 drives rate is 5*1.1MB/s is 5.5MB/s.

Now you go to a 64kB stripe size so that several small files 'supposedly' can
transfer at the same time:
64kB files, stripe size 64kB, 5 drives Raid0, 51MB/s per drive. 12ms access time.
64kB is now one stripe size. 64kB transfers in 64/51 ms = 1.25 ms.
at an average single drive transfer rate 1.25/13.25 *51 MB/s = 4.8 MB/s
That is 13% slower, that is 13,000 times your 0.00001%

Five 64kb files read at an aggregated average transfer rate of 24MB/s
A whopping 335% improvement over striping a single file.

Now for 39kB files.
In the 13kB stripe size example the single drive transfer rate was 1.1 MB/s. The
single file transfers now at 3.3MB/s. Reading several files at once get's you ~5.5MB/s.
In the 64kB stripe size example the transfer rate is (.75/12.75)*51MB/s = 3MB/s
Reading 5 files at once get's you 15MB/s. Still a 170% improvement.

Now for 26kB files.
In the 13kB stripe size example the single drive transfer rate was 1.1 MB/s.
The file transfers with 2.2MB/s Reading several files at once get's you ~5.5MB/s
In the 64kB stripe size example the file transfer rate is (.5/12.5)*51MB/s = 2MB/s
Reading several files at once get's you 10 MB/s. That is now only a mere 80% improvement.

Now for 13kB files.
In the 13kB stripe size example the single drive transfer rate was 1.1 MB/s.
Reading several files at once get's you 5.5MB/s
In the 64kB stripe size example the single file transfer rate is still 1.1 MB/s.
Reading several files at once get's you still only 5.5MB/s.
No improvement at all anymore. Vanished. Foetsie.

Your improvement only takes place for files in between the stripe
size and 1/n-1 the stripe size. Anything below that has no effect.

The 350-byte files fall well below that.


On a not so busy server you are insuring that the small files will transfer
even slower compared to doing nothing. Nice one.


On a server that is only opening 1 file at a time. (not a realistic
scnenario) I am slowing down the opening of that file by 0.00001%


13%.


So who cares about that? (well you do obviously, but anyone actually
using that server in real life server won't)

When you leave it as it is that you thought was best, at least you don't
make the ones that fill a stripe width slower, and, when they are less
than that, smaller files automatically fill up the gap when the server is
busy and has many outstanding IO.


But whatever you do, you might increase performance, but you will
never ever get large transferrates with small files.


Right, now for yourself to let that sink in.


How about you actually start to listen to what I explain to you and
LEARN from it?


I finally did, and what did I discover? That you are full of ****,
exactly as I had anticipated. Yes, I learned a great deal from you.


The reason for that is the following: The time it takes to search for
that file on the disk and open it is very large in comparison to the
time it takes to transfer it.

The same might also happen on the network. I'm not sure what overhead
you get on small files in the network. But you might want to take a
look at what happens when you access those small files directly on the
server, compared to what happens when you access them over the
network.

Are you doing a lot of writing to the array controller? You have to
accept that writes will never be fast. Extra cache will not fix that.


Not on a busy server, no. And not if the write speed is not disk related.
It will if it is and the cache can catch up in less busier periods acting
as a buffer.

And it


What "it"?

might slow the the reads down a bit in such a way that on
average the user experience is slower.
I don't have personal experience with roaming profiles, but I'd guess
they require more read than write capacity. More cache in the array
controller might help. if you have the option to add more.

You could experiment with the cluster size, but I don't think that
that will help. The cause of the slow performance is the relatively
large seektime when accessing small files.


And that doesn't change when the clustersize is smaller.


Actually, I took that as a general comment, not necessarily OP's 4kB
clusters and 350-byte files.


Actually it does when already small files fragment because of it.


How about you start reading the thread again.
We are talking about files that are about 350 bytes in size!


In theory we are talking about small files up to 64kB (the stripesize)
Theoretically it can be in 16 fragments with a 4kB cluster size

While fragmenting is bad when it is on the same drive it can be beneficial
if the fragments are on seperate drives in an array.


The smallest cluster size that NTFS supports is already much bigger
then that: 512 bytes. So how are these files going to be fragmented by
that clustersize?


Those obviously not.
The ones bigger than 4kB and upto 64kB obviously can.


I'm afraid that I can't think of much to improve the situation.
Basically the applications shouldn't create so many extremely small
files, because that will always hurt performance.


Unless they sit on a dedicated drive that is not mechanical in nature.
Solid State Disk.


That could be an option. But Even though there are lots of very small
files in the roaming profile, the rest of the roaming profile could be
very big. Windows probably won't let the profile be stored on two
different types of disk.

(Your backup software is probably not too happy about it either)


That obviously depends on the type of backup.


Obviously that is also the reason why I said that the backup is
PROBABLY not too happy about it.
For some backup methods it indeed doesn't matter.

Marc

  #19  
Old July 23rd 04, 12:41 PM
Marc de Vries
external usenet poster
 
Posts: n/a
Default

On Thu, 22 Jul 2004 01:39:23 +0200, "Folkert Rienstra"
wrote:

snip
The array controller then has the option to read multiple files
simultaneously from different spindles.

Therefor still reads at full stripe width transfer rates.


Which is not important for those very small files at all.


Ofcourse it is. You wouldn't be making the effort if it wasn't so.
As for those 350-byte files it won't make any difference, that I will agree.


So even though you constantly said in this thread that I was wrong you
now have changed your mind and agree that I was actually right all
that time.

Nice to finally see that statement in black and white. End of
discussion.

Marc
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best drive configuration? Noozer General 20 May 27th 04 03:10 AM
RAID card for my PC?? TANKIE General 5 May 22nd 04 01:09 AM
performance degradation backing up small files alan Storage & Hardrives 2 April 27th 04 05:47 AM
Strange files saved the hard disk SunMyoung Yoon General 1 January 3rd 04 04:44 AM
SDLT wear & tear (small files vs. big files) George Sarlas Storage & Hardrives 12 September 29th 03 11:07 PM


All times are GMT +1. The time now is 08:18 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.