This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH roland/lll-futex] Separate Linuxisms from lowlevellock.h, make a generic one


> I have a few comments, 2 about other things I missed before, and 1 about
> the bug I was trying to fix.

Thanks very much for the review.

My inclination is to start out with a version like mine that doesn't change
any code from what it does now on at least one platform.  (I chose ARM just
because that's what I'm working on first for the non-Linux port where this
refactoring matters.)  Once it's in and being used by ARM (and/or one or
more other machines where the code matches today), then we can do these
other fixes incrementally.  That is: first refactor without change; then fix.

> On 3 Jul 2014, at 00:14, Roland McGrath <roland@hack.frob.com> wrote:
> 
> > +#define __lll_robust_trylock(futex, id) \
> > +  (atomic_compare_and_exchange_val_acq (futex, id, 0) != 0)
> 
> My first thought was that the != 0 is superfluous. Either way you return
> 0 if you changed the memory, and non-zero if you didn't.
> 
> But looking at the call in pthread_mutex_trylock.c, I wonder if both
> patches have this wrong. Should we be returning the old value here?
> That's what (at least) m68k does, so there's a difference that'll need
> resolving.

Indeed this seems clearly wrong.  Unfortunately it looks to me like the
scenario where it makes a difference can only come up in a race.  The logic
in pthread_mutex_trylock is somewhat intricate, so I'm not completely sure.
Can you see a way to come up with a (non-racy) regression test for this?

I've looked at all the machines' implementations of lll_robust_trylock.
Most look just like ARM, i.e. using atomic_compare_and_exchange_val_acq
and comparing against zero.

s390 open-codes in assembly exactly what the
atomic_compare_and_exchange_val_acq would do, and then does the
erroneous comparison in C.

sh open-codes in assembly exactly what the
atomic_compare_and_exchange_val_acq along with the erroneous comparison
would do.

i386, x86_64, and powerpc open-code in assembly exactly what the
atomic_compare_and_exchange_val_acq would do, and don't do the erroneous
comparison.

m68k uses atomic_compare_and_exchange_val_acq without the erroneous
comparison.

The sole use of lll_robust_trylock is in pthread_mutex_trylock, right
after code that just uses atomic_compare_and_exchange_val_acq directly
on the same memory location.

So I think what we should really do is just remove lll_robust_trylock
entirely.  It doesn't let the machine encapsulate any additional magic,
because the generic code already assumes exactly what the use of the
location has to be.  This could be done either before or after my patch.

> > +#define __lll_cond_lock(futex, private)                         \
> > +  ((void)                                                       \
> > +   ({                                                           \
> > +     int *__futex = (futex);                                    \
> > +     if (__glibc_unlikely (atomic_exchange_acq (__futex, 2)))   \
> > +       __lll_lock_wait (__futex, private);                      \
> > +   }))
> 
> This is unconditionally setting the futex to 2, where my patch (based
> on Tile) sets it to 2 only if it was previously 0. Again, the
> platforms are inconsistent so it'll need resolving.

The lay of the land is pretty similar for this case.  Many have exactly
equivalent code.  Many others have code that uses one of
atomic_compare_and_exchange_{bool,val}_acq and so match the behavior of
yours rather than mine.  A few others open-code assembly that is
equivalent to one or the other.

Unlike lll_robust_trylock, there is one outlier: sparc is almost the
same (as yours), but uses atomic_compare_and_exchange_val_24_acq (a
special sparcism).  So it's not a candidate for removing entirely even
though it too has only one user (pthread_mutex_cond_lock.c).

Off hand I don't understand the use here well enough to be completely
sure which is correct.  In the contended case, they all get to
__lll_lock_wait.  The generic C and the x86 assembly versions of that do
the equivalent of the unconditional atomic_exchange_acq.  I suspect that
is actually what's correct: if it was 0 before that's the
unlocked/uncontended case; if it was 1 before, your code leaves it 1
(until it gets into __lll_lock_wait, anyway) but there is now
contention, so it should be 2; when it gets into __lll_lock_wait, that
will do an unconditional atomic_exchange_acq anyway.

Given this control flow, I think it's impossible to write any test that
can tell the difference.  Your version is just going to be slightly
suboptimal in the contended case, because it will skip installing 2 and
then do an additional atomic operation in __lll_lock_wait to install it
before going into the kernel.

So we just have to reason it out.  Do you agree with my reasoning above?

I've updated my version on the branch to have an explicit != 0 in the
comparison to make it slightly more clear.  (All this stuff could use
vastly more comments.)

> > +#define __lll_timedlock(futex, abstime, private)                \
> > +  ({                                                            \
> > +    int *__futex = (futex);                                     \
> > +    int __val = 0;                                              \
> > +                                                                \
> > +    if (__glibc_unlikely (atomic_exchange_acq (__futex, 1)))    \
> > +      __val = __lll_timedlock_wait (__futex, abstime, private); \
> > +    __val;                                                      \
> > +  })
> 
> This'll spread BZ 16892 (and be a binary difference on the currently
> unaffected platforms).
> 
> Should be something like:
> 
> if (__glibc_unlikely (atomic_compare_and_exchange_bool_acq (__futex, 1, 0)))

OK.  I concur with the rationale described in the bug.  But as I said
above, I'm inclined to separate that fix to be done after the pure
refactoring has gone in.

I think what I'll do now is proceed with working up a change to
eliminate lll_robust_trylock in the status quo ante.  I'm going on the
assumption that we'll conclude it's impossible to test for that issue,
but please do put your mind to coming up with a useful test case (or
else contributing to the belief that it can't be done).

That will leave just the latter two issues to be resolved after the
refactoring goes in.  For lll_cond_lock, I think what we want is no
change from my version (and the status quo ante for ARM), but that needs
to be confirmed by other people contemplating the logic.  For
lll_timedlock, I do think your change is correct and we'll do it after.

Actually changing ARM has to wait for Joseph to come back next week and
approve it.  By the time he's back, I'll be travelling for Cauldron and
then I can't be relied on to do anything before the 28th (though I might
well do some things while travelling).

I pointed out earlier that the aarch64 file is nearly identical to the
arm file.  So if Marcus is around now and able to review and approve
before Joseph is back, we could make aarch64 the first to use the new
generic file instead of arm.  (I can do a build for aarch64 and verify
that it has no effect on the compiled code.)  Then I think we could
proceed with the follow-on fixes and let ARM and other machines drop
their now-redundant files at their maintainers' leisure to pick up those
fixes.

Assuming nobody tells me why any of this is a bad idea, I'll proceed in
that direction now.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]