Extreme slowdown due to malloc?
Achim Gratz
Stromeko@nexgo.de
Sat Jan 2 14:17:31 GMT 2021
ASSI writes:
>> I have a Cygwin malloc speedup patch that *might* help the m-t part.
>> I'll prepare and submit that to cygwin-patches shortly.
>
> Well, if you want to test it with the new ZStandard, give it a spin…
> I'll check how far I can strip that test down so you can use the Cygwin
> source tree for testing.
OK, it's actually pretty simple, do this inside a checkout of
newlib-cygwin:
$ find newlib winsup texinfo -type f > flist
$ zstd --train-cover --ultra -22 -T0 -vv --filelist=flist -o dict-cover
On Linux, it reads in all the files in about two seconds, while it takes
quite a while longer on Cygwin. But the real bummer is that
constructing the partial suffix arrays (which is single-threaded) will
seemingly take forever, while it's done much faster on Linux. You can
pare down the number of files like that:
$ shuf -n 320 flist > slist
and then use that shorter file instead. I get this:
*** Linux E3-1225v3 4C/4T 3.2/3.6GHz
|------+----------+-------+----------+---------+--------+--------+------------|
| n | user | sys | total | wall | util | serial | pagefaults |
|------+----------+-------+----------+---------+--------+--------+------------|
| 100 | 116.092 | 0.187 | 116.279 | 30.82 | 377.3% | 2.0% | 0 |
| 200 | 145.481 | 0.135 | 145.616 | 38.65 | 376.8% | 2.1% | 0 |
| 400 | 288.341 | 0.414 | 288.755 | 77.84 | 371.0% | 2.6% | 0 |
| 800 | 517.288 | 0.623 | 517.911 | 138.93 | 372.8% | 2.4% | 0 |
| 1600 | 1229.348 | 1.752 | 1231.100 | 333.37 | 369.3% | 2.8% | 0 |
| 3200 | 2508.250 | 3.632 | 2511.882 | 678.96 | 370.0% | 2.7% | 0 |
| 6400 | 4380.693 | 5.352 | 4386.045 | 1176.43 | 372.8% | 2.4% | 0 |
|------+----------+-------+----------+---------+--------+--------+------------|
*** Cygwin E3-1276v3 4C/8T 3.6/4.0GHz
|------+----------+--------+----------+---------+--------+--------+------------|
| n | user | sys | total | wall | util | serial | pagefaults |
|------+----------+--------+----------+---------+--------+--------+------------|
| 100 | 141.906 | 0.796 | 142.702 | 20.53 | 695.1% | 2.2% | 327860 |
| 200 | 198.140 | 1.328 | 199.468 | 29.39 | 678.7% | 2.6% | 452870 |
| 400 | 425.749 | 2.328 | 428.077 | 66.03 | 648.3% | 3.3% | 752357 |
| 800 | 822.250 | 3.499 | 825.749 | 150.42 | 549.0% | 6.5% | 1277198 |
| 1600 | 1773.578 | 8.483 | 1782.061 | 383.42 | 464.8% | 10.3% | 3011298 |
| 3200 | 4322.281 | 15.890 | 4338.171 | 1292.92 | 335.5% | 19.8% | 5746903 |
| 6400 | 8499.750 | 29.437 | 8529.187 | 3275.66 | 260.4% | 29.6% | 10543919 |
|------+----------+--------+----------+---------+--------+--------+------------|
So even with smaller number of files (where the serial portion of the
code is not dominating yet) you see that the faster machine expends more
cycles already. Looking at the differences there is a strong indication
for those pagefaults to constitute the main portion of that extra time.
The last column is the time per pagefault in µs assuming that the extra
time was all spent there. This is obviously not quite correct, as that
number should roughly be constant if that assumption holds, but it's
close enough to uphold the original hypothesis.
*** Linux vs. Cygwin
|------+--------------+--------------+----------------+--------------+------------+------|
| n | Linux total | Linux scaled | Cygwin total | Cygwin-Linux | pagefaults | t/pf |
|------+--------------+--------------+----------------+--------------+------------+------|
| 100 | 116.279 | 104.651 | 142.702 | 38.051 | 327860 | 116. |
| 200 | 145.616 | 131.054 | 199.468 | 68.414 | 452870 | 151. |
| 400 | 288.755 | 259.880 | 428.077 | 168.197 | 752357 | 224. |
| 800 | 517.911 | 466.120 | 825.749 | 359.629 | 1277198 | 282. |
| 1600 | 1231.100 | 1107.990 | 1782.061 | 674.071 | 3011298 | 224. |
| 3200 | 2511.882 | 2260.694 | 4338.171 | 2077.477 | 5746903 | 361. |
| 6400 | 4386.045 | 3947.441 | 8529.187 | 4581.746 | 10543919 | 435. |
|------+--------------+--------------+----------------+--------------+------------+------|
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada
More information about the Cygwin-apps
mailing list