Cygwin multithreading performance

Corinna Vinschen
Wed Dec 9 09:20:00 GMT 2015

On Dec  8 22:10, Mark Geisert wrote:
> I'll give the snapshot a try, thanks, but it's hard for me to see :) how
> eliminating a few hundred locks and unlocks that are 75% under 1 usec anyway
> will help much.  I may have unduly influenced anybody following this by only
> showing detailed stats for the last two locks that I picked at random.
> What I'm more focused on are the locks being heavily used by git and heavily
> contended by the four git threads in OP's testcase.  Here are the caller
> stats for this set of locks...
> lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
> mtx  4908/06002A00F0 lks 155066, ulks 155066,  66359   4395  78895   4742    675
>   caller 0x0100454FAD, count 154792, L, /usr/src/git/builtin/pack-objects.c:1758
>   caller 0x0100454FFC, count 154777, U, /usr/src/git/builtin/pack-objects.c:1769
>   [...]
> mtx  4908/0600FA6860 lks 154745, ulks 154745,  56092   3217  64883  25435   5118
>   caller 0x0100455659, count   1011, L, /usr/src/git/builtin/pack-objects.c:1640
>   caller 0x0100455687, count   1011, U, /usr/src/git/builtin/pack-objects.c:1642
>   caller 0x01004556E0, count 153734, L, /usr/src/git/builtin/pack-objects.c:1628
>   caller 0x010045570E, count 153734, U, /usr/src/git/builtin/pack-objects.c:1630
> mtx  4908/0603498600 lks 316825, ulks 316825, 146254   4986 155345   9686    554
>   caller 0x0100455269, count 196769, L, /usr/src/git/builtin/pack-objects.c:1695
>   caller 0x01004552C4, count  15148, U, /usr/src/git/builtin/pack-objects.c:1705
>   caller 0x0100455478, count 181621, U, /usr/src/git/builtin/pack-objects.c:1702
>   caller 0x010045554C, count 120056, L, /usr/src/git/builtin/pack-objects.c:1834
>   caller 0x010045556E, count 120056, U, /usr/src/git/builtin/pack-objects.c:1837
> If you look at the ">1000" column, you see that these 3 locks account for
> 675+5118+554 = 6347ms = 6.347sec spent waiting for these locks, best case
> for just this 1000+ usec latency bucket.  The next column to the left (a
> faster bucket for latencies 101us up to 1000us) has 4742+25435+9686 = 39863,
> times 101us best case for this bucket, giving 4.026sec spent waiting for
> these locks.
> I probably need to modify my stats gathering program to display the total
> time spent waiting to make it more clear how things add up for each mutex
> across the whole testcase.  Again, these stats are not timing how long locks
> are being held, but rather how long it takes just to acquire the lock.

But acquiring a lock includes the time it actually has to wait for
another thread holding the lock so you're measuring *also* the time
locks are hold, just by another thread.  locking and unlocking a
pthread_mutex in Cygwin might be slower than using a critical section,
but it's not *that* slow.

The underlying point is, mutexes are not used as an end in itself.  They
guard a shared resource.  What we ultimately need to know here is

1. What resource is it the pthread_mutexes are guarding?

2. Why does accessing the resource take so long that threads have to wait
   so long?


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <>

More information about the Cygwin-developers mailing list