Extreme slowdown due to malloc?

Achim Gratz Stromeko@nexgo.de
Sat Jan 2 14:17:31 GMT 2021


ASSI writes:
>> I have a Cygwin malloc speedup patch that *might* help the m-t part.
>> I'll prepare and submit that to cygwin-patches shortly.
>
> Well, if you want to test it with the new ZStandard, give it a spin…
> I'll check how far I can strip that test down so you can use the Cygwin
> source tree for testing.

OK, it's actually pretty simple, do this inside a checkout of
newlib-cygwin:

$ find newlib winsup texinfo -type f > flist
$ zstd --train-cover --ultra -22 -T0 -vv --filelist=flist -o dict-cover

On Linux, it reads in all the files in about two seconds, while it takes
quite a while longer on Cygwin.  But the real bummer is that
constructing the partial suffix arrays (which is single-threaded) will
seemingly take forever, while it's done much faster on Linux.  You can
pare down the number of files like that:

$ shuf -n 320 flist > slist

and then use that shorter file instead.  I get this:

*** Linux E3-1225v3 4C/4T 3.2/3.6GHz
|------+----------+-------+----------+---------+--------+--------+------------|
|    n |     user |   sys |    total |    wall |   util | serial | pagefaults |
|------+----------+-------+----------+---------+--------+--------+------------|
|  100 |  116.092 | 0.187 |  116.279 |   30.82 | 377.3% |   2.0% |          0 |
|  200 |  145.481 | 0.135 |  145.616 |   38.65 | 376.8% |   2.1% |          0 |
|  400 |  288.341 | 0.414 |  288.755 |   77.84 | 371.0% |   2.6% |          0 |
|  800 |  517.288 | 0.623 |  517.911 |  138.93 | 372.8% |   2.4% |          0 |
| 1600 | 1229.348 | 1.752 | 1231.100 |  333.37 | 369.3% |   2.8% |          0 |
| 3200 | 2508.250 | 3.632 | 2511.882 |  678.96 | 370.0% |   2.7% |          0 |
| 6400 | 4380.693 | 5.352 | 4386.045 | 1176.43 | 372.8% |   2.4% |          0 |
|------+----------+-------+----------+---------+--------+--------+------------|

*** Cygwin E3-1276v3 4C/8T 3.6/4.0GHz 
|------+----------+--------+----------+---------+--------+--------+------------|
|    n |     user |    sys |    total |    wall |   util | serial | pagefaults |
|------+----------+--------+----------+---------+--------+--------+------------|
|  100 |  141.906 |  0.796 |  142.702 |   20.53 | 695.1% |   2.2% |     327860 |
|  200 |  198.140 |  1.328 |  199.468 |   29.39 | 678.7% |   2.6% |     452870 |
|  400 |  425.749 |  2.328 |  428.077 |   66.03 | 648.3% |   3.3% |     752357 |
|  800 |  822.250 |  3.499 |  825.749 |  150.42 | 549.0% |   6.5% |    1277198 |
| 1600 | 1773.578 |  8.483 | 1782.061 |  383.42 | 464.8% |  10.3% |    3011298 |
| 3200 | 4322.281 | 15.890 | 4338.171 | 1292.92 | 335.5% |  19.8% |    5746903 |
| 6400 | 8499.750 | 29.437 | 8529.187 | 3275.66 | 260.4% |  29.6% |   10543919 |
|------+----------+--------+----------+---------+--------+--------+------------|

So even with smaller number of files (where the serial portion of the
code is not dominating yet) you see that the faster machine expends more
cycles already.  Looking at the differences there is a strong indication
for those pagefaults to constitute the main portion of that extra time.
The last column is the time per pagefault in µs assuming that the extra
time was all spent there.  This is obviously not quite correct, as that
number should roughly be constant if that assumption holds, but it's
close enough to uphold the original hypothesis.

*** Linux vs. Cygwin
|------+--------------+--------------+----------------+--------------+------------+------|
|    n | Linux  total | Linux scaled | Cygwin   total | Cygwin-Linux | pagefaults | t/pf |
|------+--------------+--------------+----------------+--------------+------------+------|
|  100 |      116.279 |      104.651 |        142.702 |       38.051 |     327860 | 116. |
|  200 |      145.616 |      131.054 |        199.468 |       68.414 |     452870 | 151. |
|  400 |      288.755 |      259.880 |        428.077 |      168.197 |     752357 | 224. |
|  800 |      517.911 |      466.120 |        825.749 |      359.629 |    1277198 | 282. |
| 1600 |     1231.100 |     1107.990 |       1782.061 |      674.071 |    3011298 | 224. |
| 3200 |     2511.882 |     2260.694 |       4338.171 |     2077.477 |    5746903 | 361. |
| 6400 |     4386.045 |     3947.441 |       8529.187 |     4581.746 |   10543919 | 435. |
|------+--------------+--------------+----------------+--------------+------------+------|



Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada


More information about the Cygwin-apps mailing list