This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)

From: Will Newton <will dot newton at linaro dot org>
To: Torvald Riegel <triegel at redhat dot com>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, David Miller <davem at davemloft dot net>, Marcus Shawcroft <marcus dot shawcroft at linaro dot org>, "Joseph S. Myers" <joseph at codesourcery dot com>, Mike Frysinger <vapier at gentoo dot org>, Andreas Schwab <schwab at linux-m68k dot org>, Steve Ellcey <sellcey at mips dot com>, Chris Metcalf <cmetcalf at tilera dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
Date: Mon, 7 Apr 2014 13:46:53 +0100
Subject: Re: [PATCH] Unify pthread_once (bug 15215)
Authentication-results: sourceware.org; auth=none
References: <1368024237 dot 7774 dot 794 dot camel at triegel dot csb> <519D97E4 dot 4030808 at redhat dot com> <1381018836 dot 8757 dot 3598 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1310071604090 dot 31470 at digraph dot polyomino dot org dot uk> <1381182784 dot 18547 dot 138 dot camel at triegel dot csb> <533605BF dot 9000005 at redhat dot com> <1396874251 dot 10643 dot 8736 dot camel at triegel dot csb>

On 7 April 2014 13:37, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2014-03-28 at 19:29 -0400, Carlos O'Donell wrote:
>> David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
>>
>> We would like to unify all C-based pthread_once implmentations
>> per the plan in bug 15215 for glibc 2.20.
>>
>> Your machines are on the list of C-based pthread_once implementations.
>>
>> See this for the intial discussions on the unified pthread_once:
>> https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
>>
>> The goal is to provide a single and correct C implementation of
>> pthread_once. Architectures can then build on that if they need more
>> optimal implementations, but I don't encourage that and I'd rather
>> see deep discussions on how to make one unified solution where
>> possible.
>>
>> I've also just reviewed Torvald's new pthread_once microbenchmark which
>> you can use to compare your previous C implementation with the new
>> standard C implementation (measures pthread_once latency). The primary
>> use of this test is to help provide objective proof for or against the
>> i386 and x86_64 assembly implementations.
>>
>> We are not presently converting any of the machines with custom
>> implementations, but that will be a next step after testing with the
>> help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
>>
>> If we don't hear any objections we will go forward with this change
>> in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
>> and aarch64 on a single pthread_once implementation based on sparc's C
>> implementation.
>
> So far, I've seen an okay for tile, and a question about ARM.  Will, are
> you okay with the change for ARM?

>From a correctness and maintainability standpoint it looks good. I
have concerns about the performance but I will leave that call to the
respective ARM and AArch64 maintainers.

In your original post you speculate it may be possible to improve
performance on ARM:

"I'm currently also using the existing atomic_{read/write}_barrier
functions instead of not-yet-existing load_acq or store_rel functions.
I'm not sure whether the latter can have somewhat more efficient
implementations on Power and ARM; if so, and if you're concerned about
the overhead, we can add load_acq and store_rel to atomic.h and start
using it"

It would be interesting to know how much work that would be and what
the performance improvements might be like.

> Any other objections to the updated patch that's attached?
>
>> > +   When forking the process, some threads can be interrupted during the second
>> > +   state; they won't be present in the forked child, so we need to restart
>> > +   initialization in the child.  To distinguish an in-progress initialization
>> > +   from an interrupted initialization (in which case we need to reclaim the
>> > +   lock), we look at the fork generation that's part of the second state: We
>> > +   can reclaim iff it differs from the current fork generation.
>> > +   XXX: This algorithm has an ABA issue on the fork generation: If an
>> > +   initialization is interrupted, we then fork 2^30 times (30b of once_control
>>
>> What's "30b?" 30 bits? Please spell it out.
>>
>> > +   are used for the fork generation), and try to initialize again, we can
>> > +   deadlock because we can't distinguish the in-progress and interrupted cases
>> > +   anymore.  */
>>
>> Would you mind filing a bug for this in the upstream bugzilla?
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=16816
>
>> It's a distinct bug from this unification work, but a valid problem.
>>
>> Can this be fixed by detecting generation counter overflow in fork
>> and failing the function call?
>
> Yes, but this would prevent us from doing more than 2^30 fork calls.
> That may not be a problem in practice -- but if so, then we won't hit
> the ABA either :)
>
>> > +      do
>> > +   {
>> > +     /* Check if the initialization has already been done.  */
>> > +     if (__builtin_expect ((val & 2) != 0, 1))
>>
>> Use __glibc_likely.
>>
>> e.g. if (__glibc_likely ((val & 2) != 0))
>>
>> This is the fast path that we are testing for in the microbenchmark?
>
> Yes.
>
>> > +       return 0;
>> > +
>> > +     oldval = val;
>> > +     /* We try to set the state to in-progress and having the current
>> > +        fork generation.  We don't need atomic accesses for the fork
>> > +        generation because it's immutable in a particular process, and
>> > +        forked child processes start with a single thread that modified
>> > +        the generation.  */
>> > +     newval = __fork_generation | 1;
>>
>> OT: I wonder if Valgrind will report a benign race in accessing __fork_generation.
>
> Perhaps.  I believe that eventually, lots of this and similar variables
> should be atomic-typed and/or accessed with relaxed-memory-order atomic
> loads.  This would clarify that we expect concurrent accesses and that
> they don't constitute a data race.
>
>
> [BZ #15215]
> * nptl/sysdeps/unix/sysv/linux/sparc/pthread_once.c: Moved to ...
> * nptl/sysdeps/unix/sysv/linux/pthread_once.c: ... here.  Add missing
> memory barriers.  Add comments.
> * sysdeps/unix/sysv/linux/aarch64/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/arm/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/ia64/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/m68k/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/mips/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/tile/nptl/pthread_once.c: Remove file.
>
> Changelog.hppa:
>         [BZ #15215]
>         * sysdeps/unix/sysv/linux/hppa/nptl/pthread_once.c: Remove file.
>



-- 
Will Newton
Toolchain Working Group, Linaro

Follow-Ups:
- Re: [PATCH] Unify pthread_once (bug 15215)
  - From: Torvald Riegel

References:
- Re: [PATCH] Unify pthread_once (bug 15215)
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]