A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Who/what is abusing my fileserver



 
 
Thread Tools Display Modes
  #1  
Old May 6th 17, 03:53 PM posted to comp.arch.storage
[email protected]
external usenet poster
 
Posts: 9
Default Who/what is abusing my fileserver

Usually our TrueNAS fileservers (Really just FreeBSD with a GUI) perform well with

iostat -x

showing hundreds of megabytes/second read or written with the %b (%busy or %Utilization) at only several percent for each disk. But every few months performance goes to hell, with total throughput only 1 or 2 mbs and %b for group of disks at 99% or 100% while qlen grows from 0 or 1 to a dozen or 20 on some disks. CPU utilization stays very low. While this is happening a simple ls command can take 5 minutes. Eventually the problem solves itself.

We believe this is because a client is doing a lot of random I/O that
keeps the heads moving for very little data transfer, and that with all
that seeking none of the other clients get much attention. How do we
locate that job among the many jobs from many users on many nfs clients?
On the client computers we can find out how many bytes are transferred by
each process, but that number is small for all jobs - the one doing random
I/O doesn't get more bytes than the jobs doing sequential I/O, it just
exercises the heads more. We need more information to contact the user
doing random I/O and work with them to do something else.

Alternatively, is there some adjustment of the server that will downgrade
the priority of random access? That user might self-identify if his jobs
took forever to complete.

Daniel Feenberg
NBER
  #2  
Old May 7th 17, 04:11 PM posted to comp.arch.storage
Mark F[_2_]
external usenet poster
 
Posts: 164
Default Who/what is abusing my fileserver

On Sat, 6 May 2017 07:53:44 -0700 (PDT), wrote:

Usually our TrueNAS fileservers (Really just FreeBSD with a GUI) perform well with

iostat -x

showing hundreds of megabytes/second read or written with the %b (%busy or %Utilization) at only several percent for each disk. But every few months performance goes to hell, with total throughput only 1 or 2 mbs and %b for group of disks at 99% or 100% while qlen grows from 0 or 1 to a dozen or 20 on some disks. CPU utilization stays very low. While this is happening a simple ls command can take 5 minutes. Eventually the problem solves itself.

We believe this is because a client is doing a lot of random I/O that
keeps the heads moving for very little data transfer, and that with all

Could also be for error recovery on a couple of blocks.
I don't know about TrueNAS, but many filesystems/operating systems
don't fix ECC problems until there is a complete failure and the disks
themselves try to avoid actually rewriting data, possibly with
location, to fix the problems.

You could scan the disks and see if any performance problems arise.
You should save the SMART data before and after the scan to see if
there is any evidence of excessive error correction taking place, but
not all disks (or SSDs) report such information. You might see
counts for on the fly error recovery (which will seldom be zero
even when no real problems) and perhaps second or even third
level of recovery counts even if the drive never goes into
a full (taking several minutes) retry method.

The SMART data may even include a count of sectors known to be bad
but not being fixed.

that seeking none of the other clients get much attention. How do we
locate that job among the many jobs from many users on many nfs clients?
On the client computers we can find out how many bytes are transferred by
each process, but that number is small for all jobs - the one doing random
I/O doesn't get more bytes than the jobs doing sequential I/O, it just
exercises the heads more. We need more information to contact the user
doing random I/O and work with them to do something else.

Alternatively, is there some adjustment of the server that will downgrade
the priority of random access? That user might self-identify if his jobs
took forever to complete.

Daniel Feenberg
NBER

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
SOMEONE SHOULD CALL 911 ABOUT MARK BENDER PEDOFILE ABUSING KIDS AT 509 FROST TX 78201 TEL :(210) 734-3107 MARK BENDER/JOHNDOE LIVES AT 509 FROST TX78201 Homebuilt PC's 0 July 17th 07 10:32 AM
fileserver backup: ndmp or network backup? Christoph Peus Storage & Hardrives 6 January 30th 07 09:15 AM
Intel found to be abusing market power in Japan chrisv General 152 March 26th 05 07:57 AM
Viability of Itanium-- was Intel found to be abusing market power in Japan Robert Myers Intel 21 March 26th 05 07:57 AM
Looking for a good Cheap socket A MB for a fileserver. PAPutzback Homebuilt PC's 3 January 18th 05 05:17 PM


All times are GMT +1. The time now is 11:50 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.