View Single Post
  #24  
Old April 28th 20, 06:40 AM posted to alt.comp.os.windows-10,comp.sys.ibm.pc.hardware.storage
VanguardLH[_2_]
external usenet poster
 
Posts: 1,453
Default Why is this folder so slow?

Yousuf Khan wrote:

On 4/27/2020 2:29 PM, VanguardLH wrote:
As a test, disable your anti-virus software and run your TB data-only
backup job.


Yes, that's been done years ago too. This folder has been a major
headache for years now. And at one time, I found that the AV software
spending tons of time scanning this folder too, so I put an exclusion in
it for this folder. The AV doesn't ever scan in this folder anymore.


I just thought of something else: is that flagged as a special folder?
Right-click on the folder, and select Properties. Is there a Customize
tab? If so, select it, and check the setting for "Optimize this folder
for". Set to "General items" (instead of "Pictures").

It's not related to VSS, I've already given you the most likely cause of
the problem: there are over half million files, and each file is
inefficiently taking up little over half of the NTFS cluster, rather
than spreading a lesser number of files over many clusters. The real
question is how can we make NTFS more efficient at handling all of these
little files? NTFS is great at handling big files, but tiny little files
no so much.


Slack space is also a problem with FAT16/32, ext, or other file systems
where AUs (Allocation Units) are clusters or groups of sectors. The
file system will allocate a number of clusters that will encompass the
size of the file, but will be equal to or larger than the file's
content. Slack space is *not* just an NTFS problem.

For NTFS, files under the size for an MFT's file record are stored
inside the MFT since there is already enough space to hold the file
there. Instead of the MFT file record having a pointer to the small
file outside the MFT where there would be a lot of slack space (the
small file is nowhere the size of a cluster), the MFT file record *is*
the file.

An MFT file is 1 KB in size. If the file is smaller than that, the file
is stored in the MFT record. Actually, because the MFT file record has
a fixed 42-byte table at its start and holds file name and system
attributes.

https://hetmanrecovery.com/recovery_...ucture.htm#id4
According to specifications, MFT record size is determined by the
value of a variable in the boot sector. In practical terms, all
current versions of Microsoft Windows are using records sized 1024
bytes. The first 42 bytes store the header. The header contains 12
fields. The other 982 bytes do not have a fixed structure, and are
used to keep attributes.

The MFT is not infinite in size. NTFS has a limit of 4,294,967,295
files per disk (well, per volume). Your 580,000 files is only 0.01% of
NTFS' capacity for file count. Obviously there are lots of files
elsewhere in that volume.

NTFS doesn't have a problem between small and large files regarding
addressing them. It's the level of fragmentation that cause a problem.
Yeah, you think you don't need to defragment and should not defragment
an SSD because, after all, accessing memory at one address is the same
speed as accessing other memory. However, NTFS cannot support an
infinite chain of fragments for a file. Each fragment consumes an
extended file record in the MFT (a record outside the MFT). There are
limitations in every file system. Around 1.5 million fragments is the
limit per file under NTFS.

Doesn't Thunderbird have a compaction function? Used it yet? I don't
know if that will eliminate any fragmention of the files used to store
the messages or articles which, as I recall, are stored as seperate
files instead of inside a database, but I haven't used TB in a long
time.

Users don't think they ever need to defragment an SSD. All those extra
writes with no effective change in data content reduces the lifespan of
the SSD (writes are destructive). Sure, when there are few or dozen
fragments then the extra writes to defragment are wasting the SSD. It
takes time to chain from the MFT's record and through every external
extended record (which consumes space in the file system) to build up
the entire file. It's not one lookup in the MFT for the file. It's a
chained lookup for every fragment. IOPS will increase as fragmentation
increases, and perhaps why you are seeing high CPU usage when backing up
those files. Most users think of fragmentation as a performance issue
with moving physical media, like hard disks. Fragmentation ON ANY MEDIA
is still an I/O overhead issue and inflates the IOPS to process them
all. Yes, there is a limit in NTFS to the number of fragments that a
file may have, but the more fragments there are the more space is
consumed in the file system to track those fragments and the more CPU
consumed to process the fragments. When an OS sees a file comprised of
multiple fragments, there are more multiple I/O operations to process
the whole file. If Windows see 20 pieces at the logical layer, there
are 20 I/O operations to process the whole file as a read or write.

Fragmentation is not just a performance issue at the physical layer. It
is also a performance factor at the logical layer (file system).
Extreme fragmentation requires lots of repeated writes to a file. I
don't know what you've been doing with those files in the problematic
folder. If they are photos, you rarely edit those, just copy them.

Similarly, for a backup job, it has to perform the IOPS'es needed to
read all the files included in the backup. I have under 400,000 files
on my entire OS+app drive (which is a partition spanning the entire
SSD). You have more than that in one folder. From your description,
the backup job is CPU bound with all those IOPS. Do you really have
over 500K files in just one folder? You never considered creating a
hierarchical structure of subfolders to hold groups of those files based
on a common criteria for each subfolder? Just because you can dump
hundreds of thousands of files into a single folder doesn't mean that's
a good behavior.

By the say, in Macrium Reflect, did you configure your backup job to
throttle its use of the CPU? That's to prevent a backup job from
sucking up all the CPU while preventing the computer being usable to the
user during the backup. In a Reflect backup job, you can configure its
priority. Well, if you set it at max (which is still, I believe, less
than real-time priority), that process sucks up most of the CPU and
leave little for use by other processes making the computer unusable to
you. Even if you schedule the backup to run when you're not at the
computer, other backgrounded processes, like your startup programs, and
even the OS want some CPU slices.

The compression you select for a backup job also dictates how much it
consumes the CPU. You will find very little difference in the size of
the backup file between Medium (recommended) and High compression
levels. The backup job will take a lot longer trying to compress more
the backup file, but the result is little improvement in reduction of
the backup file, especially for non-compressible file formats, like
images, but wastes a lot of CPU time for insignificant gain.

I did not find an option in Reflect to throttle how much bandwidth it
uses on the data bus, like a limit on IOPS. Not for network traffic,
but how busy it keeps the data bus. If it is flooded, and especially
for a high[er] priority process, you have to wait to do any other data
I/O.