Extreme slowdown due to malloc?

Mark Geisert mark@maxrnd.com
Mon Jan 18 07:07:24 GMT 2021


Hi Achim,
Thank you very much for the detailed instructions and also the comparison data 
Linux vs Cygwin for all those testcases.

Achim Gratz wrote:
> ASSI writes:
>>> I have a Cygwin malloc speedup patch that *might* help the m-t part.
>>> I'll prepare and submit that to cygwin-patches shortly.
>>
>> Well, if you want to test it with the new ZStandard, give it a spin…
>> I'll check how far I can strip that test down so you can use the Cygwin
>> source tree for testing.

I've now done this.  And I don't see any improvement.  Reasons below...

> OK, it's actually pretty simple, do this inside a checkout of
> newlib-cygwin:
> 
> $ find newlib winsup texinfo -type f > flist
> $ zstd --train-cover --ultra -22 -T0 -vv --filelist=flist -o dict-cover
> 
> On Linux, it reads in all the files in about two seconds, while it takes
> quite a while longer on Cygwin.  But the real bummer is that
> constructing the partial suffix arrays (which is single-threaded) will
> seemingly take forever, while it's done much faster on Linux.  You can
> pare down the number of files like that:
> 
> $ shuf -n 320 flist > slist

I've settled on '-n 1600' for testing.  I'm running these Cygwin tests on a 2C/4T 
i3-something with 8GB memory and an SSD used for filesystem and page file.  Not a 
dog but clearly not a dire-wolf either.

The page fault numbers are comparable to what you've shown for Cygwin on your 
system.  The long pause after zstd prints "Constructing partial suffix array" is 
because zstd is cpu-bound in qsort() for a long time.  No paging during that time. 
  Then when the statistics start being printed out, that's when the paging 
insanity starts.

What I discovered is that zstd is repeatedly asking malloc() for large memory 
blocks, presumably to mmap files in, then free()ing them.  Any malloc request 256K 
or larger is fulfilled by mmap() rather than enlarging the heap for it.  But 
crucially, there is no mechanism for our malloc to hang on to freed mmap()ed pages 
for future use.  If you free an mmap()ed block, it is unmap()ed immediately.  So 
for zstd's usage pattern you get an incredible number of page faults to satisfy 
the mmap()s and Windows seems to take a non-trivial bit of time for each mmap().

I will be looking at our malloc implementation to see if tuning something can fix 
this behavior.  Adding code is the last resort.
Thanks again for the great testcase.

..mark


More information about the Cygwin-apps mailing list