View Single Post
  #45  
Old May 11th 20, 06:54 PM posted to alt.comp.os.windows-10,comp.sys.ibm.pc.hardware.storage
Paul[_28_]
external usenet poster
 
Posts: 1,467
Default Why is this folder so slow? (follow-up)

Yousuf Khan wrote:
On 4/26/2020 9:24 PM, Yousuf Khan wrote:
I have a folder on one of my SSD drives that takes 8 to 10 hours to
back up. It is only about 1.4 GB, but it is allocated 2.4 GB of space
altogether, and there are 580,000 files here. Indicates that per file
it's using up a little bit over half of a cluster on average. File
system is NTFS.

Meanwhile, this same drive can backup the remainder of the drive in
under 2 hours, and the remainder of the drive is 390 GB! Is NTFS this
inefficient for small files like this?

Yousuf Khan


Okay, so after fixing the problem with my News folder, I kept
researching what these millions of little files were, that were clogging
up my News folder. The files had an extension of WDSEML. Later I found
out that these same files are also there in Email folders, hundreds of
thousands of them too.

Initially, I thought that these must be the bodies of the messages that
Thunderbird uses to store emails and newsgroup messages. But after a bit
of research, I found out that Thunderbird itself has no use for these
files. Thunderbird does generate them, but it doesn't use them itself.
Instead it is generated only for the benefit of Windows' Search and
Indexing application. Windows Search uses it to be able to let you
search messages through the Windows Search box. So once Thunderbird
generates these files for Windows Search, it no longer has any use for
them anymore, as it stores its own internal data in a different set of
files. In fact, these WDSEML files are saved copies of individual
messages out of Thunderbird's own database. So Thunderbird maintains it
own database, but it never cleans up these copies ever in its life.
WDSEML means "Windows Desktop Search Email", in fact. I also think this
is only a specific problem with Thunderbird under Windows, it probably
isn't an issue in Thunderbird under other OS'es like Linux.

You can easily delete all of these messages, but of course Thunderbird
will regenerate them again as they come in. So what you have to do is
tell Thunderbird not to generate these files for Windows anymore. You go
into Thunderbird's options menu and turn it off (Tools → Options, then
select Advanced → General → System Integration → Allow Windows search to
search messages).

https://fileinfo.com/extension/wdseml

You can also delete them more easily by searching for and deleting just
the folders in which they reside, rather than the individual files.
These folders have an extension called *.MOZMSGS.

Yousuf Khan


In the business that would be called a "lazy implementation".

All they would have to do, is write a "search provider" and Windows
could use that to pump the files in an OLE fashion. It could have
been done by making no temporary files at all (flow from MORK file
or MBOX or whatever, right into the Windows.edb, in terms of writes).

But that would also put too much Windows-ecosystem code into
the tool, which is a no-no in cross platform tool design. You
have to keep your "philosophical purity" at all costs. Which means
using OpenGL for graphics (cross platform), instead of DirectX and X11
as separate platform interfaces.

I guess there's some benefit to federated search that includes
your email, but to my way of thinking this would only clutter up
a search result later.

Then you'd find yourself typing this in the File Explorer search box:

file:mytaxes.xlsx

instead of

mytaxes

because in the latter one, 500K of your emails are
going to get searched too. Using the file: keyword
would help staunch the mess inside the federated
database. The second search, the results would likely
scroll off the screen, obscuring the thing you really
wanted.

You might also discover the Windows.edb file is bloated
beyond recognition, because of that file set. It might
range around 1GB for a vanilla install, but after that
Thunderbird thing got indexed, it would likely double
at the very least.

You can rebuild the Windows.edb index file, using
the Indexing Options control panel in Windows 10.
I would give that a whirl after the TB folder has
had all the cruft removed. It'll take about three
hours to index the regular C: files (but this assumes
you've customized the searched folders to include
most of C: , versus the very shallow folder set used
by default).

Even finding Windows.edb is hard :-) The File Explorer
search won't allow you to find it. You'll need Agent
Ransack or Everything.exe to find that file, just so
you can see the current size, and decide whether it
needs a rebuild or not.

Aren't computers wonderful ? Such labor saving. "It
slices, it dices, it makes Julienne Fries." I don't
think I've ever made Julienne Fries, but I bet
Windows 10 has done all the pre-work for that,
over and over and over again...

Paul