This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Fix sem_post race (bug 14532)
- From: "Carlos O'Donell" <carlos at systemhalted dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Tue, 11 Sep 2012 18:26:27 -0400
- Subject: Re: Fix sem_post race (bug 14532)
- References: <Pine.LNX.4.64.1208302108250.15919@digraph.polyomino.org.uk><1347399660.3374.296.camel@triegel.csb>
On Tue, Sep 11, 2012 at 5:41 PM, Torvald Riegel <triegel@redhat.com> wrote:
>
> On Thu, 2012-08-30 at 21:09 +0000, Joseph S. Myers wrote:
> > Bug 14532 is a race condition in sem_post (generic C version), where
> > it wrongly uses an acquire barrier rather than a release barrier, so
> > wrongly allowing memory operations from before the unlock to be moved
> > after it.
>
> What is your (or anybody else's) opinion about documenting concurrent
> pieces of code in more detail to make issues like this one less likely?
>
> For example, something like:
>
> > 7e9..67e8cc5 100644
> > --- a/nptl/sysdeps/unix/sysv/linux/sem_post.c
> > +++ b/nptl/sysdeps/unix/sysv/linux/sem_post.c
> > @@ -1,5 +1,5 @@
> > /* sem_post -- post to a POSIX semaphore. Generic futex-using version.
> > - Copyright (C) 2003, 2004, 2007, 2008 Free Software Foundation, Inc.
> > + Copyright (C) 2003-2012 Free Software Foundation, Inc.
> > This file is part of the GNU C Library.
> > Contributed by Jakub Jelinek <jakub@redhat.com>, 2003.
> >
> > @@ -40,7 +40,7 @@ __new_sem_post (sem_t *sem)
> > return -1;
> > }
> > }
> > - while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1,
> > cur));
> > + while (atomic_compare_and_exchange_bool_rel (&isem->value, cur + 1,
> > cur));
>
> /* This rel memory order synchronizes with the acq memory order in
> atomic_decrement_if_positive() called by waiters. We don't need acq_rel
> here because even though we can race with other posters, we only use the
> CAS to modify isem->value, so reading an outdated value of isem->value
> does not matter. */
I would accept such a patch.
I don't like the use of `acq_rel' and would prefer we come up with some
consistent single name for an operation that has both acquire and release
semantics (totally bike-shed though).
I'm strongly in favour of such comments in the code, in fact I wish all
atomic sequences had such descriptions. Writing them up is quite a bit of
work but goes a long way towards documenting expectations for future
developers.
> Side notes:
> - Could not having acq_rel matter on ARM? I've never read the ARM memory
> model, but heard that there's no guarantee that the most recent value
> would get picked up eventually. Although I suspect that the CAS would
> do.
I don't have an example where not having acq_rel would matter.
In my mind these kinds of problems are best discussed with a specific example,
otherwise it's purely speculation.
> - Should atomic_decrement_if_positive() document the acq memory order
> that is has (and must have)?
>
> > atomic_full_barrier ();
>
> Why do we have the full barrier after the rel barrier? Is it necessary?
> Was it meant to have the affect of the rel barrier, but was incorrectly
> placed after the CAS?
I don't know why there is an atomic_full_barrier(), at first glance it
seems unneeded, but as a conservative project we'd have to prove that
it's not needed before removing it. It appears to have been present since
the function was rewritten in 2007 by Ulrich.
Cheers,
Carlos.