This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]


On Thu, May 22, 2014 at 10:10:06PM +0200, Torvald Riegel wrote:
> I haven't looked at musl, so I'll take it for what it seems to be:
> adding a state bit that's set whenever there are threads waiting on a
> variable that is also used by a futex.  (That's similar to the value 2
> used in the mutex impl.)

IIRC It is the same as the approach I have used, except for resetting
the semaphore value when there are no waiters, which is why I credited
Rich with the idea.

> That sounds okay, but I don't think it's sufficient to show that your
> changes still work.  To do that, I find it useful to think about (and
> document!) what the intent behind each piece of the algorithm is.  IOW,
> what does it do on an abstract level?  What's the overall state flow?
> Which pieces of the algorithm prevent certain states (so that the thing
> becomes manageable, overall).  What patterns do we follow?  So, discuss
> more of the why, not just the how.  If you can't explain it elegantly
> and intuitively, there's a good chance something isn't well
> understood :)

I attempted to document the algorithm in sem_wait.c, but while doing
so I found some flaws (including the one you pointed out), so I have
to go back to the drawing board on this.  I'll respond to your
questions though, hoping that they'll expose more flaws in my
understanding.

> The only thing you test below is that *this thread* saw the flag for the
> waiter-present state.
> However, it's the only place where a wake-up for 1 thread happened -- so
> it should actually see and act on *all* necessary wake-ups.  Which I
> believe isn't the case with your changes.

Right, subsequent posters won't wake any threads since they see a
positive semaphore value, increment it and return, possibly resulting
in perennially sleeping threads.  This is a problem.

> Please document why the 0 -> -1 transition is always ok.  (This CAS can
> occur anywhere in an execution, assuming that an incoming waiter is
> allowed.)  The overview documentation of the algorithm would be a good
> place to explain that, given that this can happen in a few places.

A semaphore value of 0 can occur only in the following cases:

- Initial state when there were no earlier waiters or posters
- The last waiter exited (either by decrementing the value or resetting it)

If a poster increments this value in the meantime (resulting in this 0
-> -1 transition failing or being overwritten), it will result in the
futex_wait returning with EWOULDBLOCK and the subsequent CAS should
get us the posted value.

> >    atomic_increment (&isem->nwaiters);
> 
> What about the memory orders here?  Why is the _acq above
> sufficient/required/insufficient?

The order between incrementing nwaiters and setting of the semaphore
value does not matter because we anyway cannot avoid the race between
two waiters that result in an extra futex_wait.

> > +     actually happen only when we exit due to an error.  */
> > +  if (atomic_decrement_and_test (&isem->nwaiters))
> > +    atomic_compare_and_exchange_val_acq (&isem->value, 0, -1);
> 
> ... like above, the CAS can happen at any time; the waiters has already
> exited (fetch_dec on nwaiters), so it can be suspended, and some time
> later do the -1 -> 0 transition.  Why is that okay?

It is not OK because it it could result in sem_post not waking a
waiter since it will see the 0.

> >  
> > +  /* If we are the only know waiter right now, indicate that by setting the
> 
> How do you know that's the case?

Actually we don't care if that is the case.  It's just that we need to
explicitly set it when we are the first waiter.

> > +  if (atomic_decrement_and_test (&isem->nwaiters))
> > +    atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);
> 
> Doesn't this reset to -1?

Typo, thanks for catching that.

Siddhesh

Attachment: pgphd6O0gpkP4.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]