This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/13165] New: pthread_cond_wait() can consume a signal that was sent before it started waiting


http://sourceware.org/bugzilla/show_bug.cgi?id=13165

             Bug #: 13165
           Summary: pthread_cond_wait() can consume a signal that was sent
                    before it started waiting
           Product: glibc
           Version: 2.14
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: mihaylov.mihail@gmail.com
    Classification: Unclassified


I was implementing something like a monitor on top of pthread condition
variables and I observed some strange behaviour. I was always holding the mutex
when calling pthread_cond_signal(). My code relied on only two assumptions
about the way pthread_cond_signal() works:

1) A call to pthread_cond_signal() will wake at least one thread which is
blocked on the condition, and the woken threads will start waiting on the
mutex.

2) If the signaling thread holds the mutex when it calls pthread_cond_signal(),
only threads which are already waiting on the condition variable may be woken.
In particular, if the signaling thread releases the mutex and then another
thread acquires the mutex and calls pthread_cond_wait(), the waiting thread
cannot be woken by this signal, no matter what other waiters are present before
or after the signal.

The only explanation that I could find for the observed behaviour was that my
second assumption was wrong. It seemed that I was hitting the following
scenario:

1) We have several threads which are blocked on the condvar in
pthread_cond_wait(). I'll call these threads "group A".

2) We then send N signals from another thread while holding the mutex. We are
releasing the mutex and acquiring it again between the signals.

3) Next we have several more threads (at least two) that acquire the mutex and
enter pthread_cond_wait(). I'll call these threads "group B"

4) Then we acquire the mutex in the signaling thread again and call
pthread_cond_signal() just once, then we release the mutex.

5) Two threads from group B wake up, and N-1 threads from group A wake up. In
effect one of the threads from group B has stolen a signal that was sent before
it started waiting from a thread from group A.

My expectation in this scenario is that at least N threads from group A should
wake up. I don't expect that exactly one thread from group B should wake up,
because spurious wakeups are possible. But this is not a spurious wakeup - I
have N signals, and N woken threads, it's just that the order is wrong.

I ran some experiments and they seemed to confirm my theory, so I looked at the
condvar implementation in nptl. I'm new to POSIX and Linux programing, but I
think I see how this can happen:

1) When we send the first N signals, N threads from group A that are waiting on
the cond->__data.__futex are woken and start waiting on cond->__data.__lock.

2) Then while the threads from group B enter pthread_cond_wait, some of the
woken threads from group A may remain waiting on the lock.

3) When we send the last signal, one thread from group B will wake and consume
this signal.

4) But suppose one more thread from group B wakes spuriously from
lll_futex_wait. At this moment it is possible that some of the woken threads
from group A will still be waiting on cond->__data.__lock. In that case the
spuriously woken thread from group B will see that cond->__data.__wakeup_seq
has changed (because of the last signal) and cond->__data._woken_seq has not
reached cond->__data.__wakeup_seq (because some of the woken threads in group A
are still waiting to acquire cond->__data.__lock), so it will exit the retry
loop and increase cond->__data.__woken_seq. The result is that the thread will
steal the signal.

Is this scenario really possible? And if it is, is this on purpose or is it a
bug?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]