This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

potential deadlock in sem_*wait and sem_post for MIPS architectures??


Hi all,


I've encountered the following issue while heavily using glibc's
sem_wait and sem_post simultaneously in a lot of threads: it seems
that the sem_post and sem_wait functions can enter an endless loop in
some corner cases. After diving into the code, it seems that (for the
case of sem_post, nptl/sysdeps/unix/sysv/linux/sem_post.c) the
following code is causing the problem:

int
__new_sem_post (sem_t *sem)
{
  struct new_sem *isem = (struct new_sem *) sem;

  __typeof (isem->value) cur;
  do
    {
      cur = isem->value;
      if (isem->value == SEM_VALUE_MAX)
        {
          __set_errno (EOVERFLOW);
          return -1;
        }
    }
  while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));


The problem occurs because gcc is optimizing this piece of code, and
puts 'isem->value' in a register before the while loop actually
begins. The atomic_compare_and_exchange_bool_acq macro then compares
'cur' (from the register, which is never updated in the loop) with
'isem->value' (from memory, which could be updated by another thread).
When we have multiple threads running and using the same semaphore,
the isem->value might be updated by another thread after the register
is filled with the value from memory, and then we run into an endless
loop...

I was able to fix this (and more issues of the same kind in the other
semaphore functions) by adding a 'volatile' keyword:

int
__new_sem_post (sem_t *sem)
{
  struct new_sem volatile *isem = (struct new_sem *) sem;



Now, I'm not entirely sure what should be the best way forward here.
For some reason I think I must be doing something wrong here, as this
issue should have been seen by others a lot earlier. On the other
hand, the x86 architecture (among others) has architecture-specific
code for this and is not hit by this issue.

Is it my mistake that I'm using the wrong gcc version, or wrong
compiler options? (is it illegal to build glibc with optimizations
turned on for instance?) Or are we looking at a bug that has long been
unseen because the MIPS compiler was not optimizing that much (and it
now is?)

FYI: I'm using gcc 4.2.4.


Thanks for your help,

Mischa


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]