This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Cancelling pthread_cond_wait() hangs with PRIO_INHERIT mutex


Hi!

I have a problem canceling threads waiting in a pthread_cond_wait(), that
use mutexes with the `PTHREAD_PRIO_INHERIT` attribute set. This only
happens on certain platforms though. I've posted this on stackoverflow
also
(http://stackoverflow.com/questions/11878445/cancelling-pthread-cond-wait-
hangs-with-prio-inherit-mutex ), and from the comments there it seems as
if glibc is the culprit. I haven't yet had the chance to try out the
newest versions (been using 2.13 and 2.14), but if anyone could give a few
hints as to whether this might be fixed in the current versions (or isn't
broken in an older version), I'd be happy to hear about it!

The following minimal example demonstrates my problem: (compile with g++
<filename>.cpp -lpthread)

#include <pthread.h>
#include <iostream>

pthread_mutex_t mutex;
pthread_cond_t cond;

void clean(void *arg) {
    std::cout << "clean: Unlocking mutex..." << std::endl;
    pthread_mutex_unlock((pthread_mutex_t*)arg);
    std::cout << "clean: Mutex unlocked..." << std::endl;
}

void *threadFunc(void *arg) {
    int ret = 0;
    pthread_mutexattr_t mutexAttr;
    ret = pthread_mutexattr_init(&mutexAttr); std::cout << "ret = " << ret
<< std::endl;

    //Comment out the following line, and everything works
    ret = pthread_mutexattr_setprotocol(&mutexAttr, PTHREAD_PRIO_INHERIT);
std::cout << "ret = " << ret << std::endl;
    	
    ret = pthread_mutex_init(&mutex, &mutexAttr); std::cout << "ret = " <<
ret << std::endl;
    ret = pthread_cond_init(&cond, 0); std::cout << "ret = " << ret <<
std::endl;
    	
    std::cout << "threadFunc: Init done, entering wait..." << std::endl;
    	
    pthread_cleanup_push(clean, (void *) &mutex);
    ret = pthread_mutex_lock(&mutex); std::cout << "ret = " << ret <<
std::endl;
    while(1) {
        ret = pthread_cond_wait(&cond, &mutex); std::cout << "ret = " <<
ret << std::endl;
    }
    pthread_cleanup_pop(1);
    	
    return 0;
}

int main() {
    pthread_t thread;
    int ret = 0;
    ret = pthread_create(&thread, 0, threadFunc, 0); std::cout << "ret = "
<< ret << std::endl;
    	
    std::cout << "main: Thread created, waiting a bit..." << std::endl;
    sleep(2);
    	
    std::cout << "main: Cancelling threadFunc..." << std::endl;
    ret = pthread_cancel(thread); std::cout << "ret = " << ret <<
std::endl;
    	
    std::cout << "main: Joining threadFunc..." << std::endl;
    ret = pthread_join(thread, NULL); std::cout << "ret = " << ret <<
std::endl;
    	
    std::cout << "main: Joined threadFunc, done!" << std::endl;
    return 0;
}


Every time I run it, main() hangs on pthread_join(). A gdb backtrace shows
the following:

    Thread 2 (Thread 0xb7d15b70 (LWP 257)):
    #0  0xb7fde430 in __kernel_vsyscall ()
    #1  0xb7fcf362 in __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
    #2  0xb7fcc9f9 in __condvar_w_cleanup () at
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434
    #3  0x08048fbe in threadFunc (arg=0x0) at
/home/pthread_cond_wait.cpp:22
    #4  0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at
pthread_create.c:301
    #5  0xb7de73ae in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:130

    Thread 1 (Thread 0xb7d166d0 (LWP 254)):
    #0  0xb7fde430 in __kernel_vsyscall ()
    #1  0xb7fc9d64 in pthread_join (threadid=3083950960,
thread_return=0x0) at pthread_join.c:89
    #2  0x0804914a in main () at /home/pthread_cond_wait.cpp:41

If PTHREAD_PRIO_INHERIT isn't set on the mutex, everything works as it
should, and the program exits cleanly.


Platforms with problems:

 - Embedded AMD Fusion board, running a [PTXDist][1] based 32-bit Linux
3.2.9-rt16 (with [RTpatch][2] 16). We are using the newest [OSELAS][3]
i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils
2.21.1a, kernel 2.6.39.
 - Same board with the 2011.03.1 toolchain also (gcc 4.5.2 / glibc 2.13 /
binutils 2.18 / kernel 2.6.36).

Platforms with no problems:

 - Our own ARM-board, also running a PTXDist Linux (32-bit 2.6.29.6-rt23),
using OSELAS arm-v4t cross toolchain (1.99.3) with gcc 4.3.2 / glibc 2.8 /
binutils 2.18 / kernel 2.6.27.
 - My laptop (Intel Core i7), running 64-bit Ubuntu 11.04 (virtualized /
kernel 2.6.38.15-generic), gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils
2.21.0.20110327.


I have been looking around the net for solutions, and have come across a
few patches that I've tried, but without any effect:

 - [Making the condition variables priority inheritance aware.][4]
 - [Handling EAGAIN from FUTEX_WAIT_REQUEUE_PI][5]


Are we doing something wrong in our code, which just happens to work on
certain platforms, or is this a bug in the underlying systems? If anyone
has any idea about where to look, or knows of any patches or similar to
try out, I'd be happy to hear about it.

Thanks!



  [1]: http://www.ptxdist.org/software/ptxdist/index_en.html
  [2]: https://rt.wiki.kernel.org/index.php/Main_Page
  [3]: http://www.ptxdist.de/oselas/toolchain/index_en.html
  [4]: http://sourceware.org/bugzilla/show_bug.cgi?id=11588
  [5]:
http://sourceware.org/git/?p=glibc.git;a=commit;h=c5a0802a682dba23f92d47f0
f99775aebfbe2539


Best regards,
Simon Falsig


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]