This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On 7 April 2014 13:37, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2014-03-28 at 19:29 -0400, Carlos O'Donell wrote:
>> David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
>>
>> We would like to unify all C-based pthread_once implmentations
>> per the plan in bug 15215 for glibc 2.20.
>>
>> Your machines are on the list of C-based pthread_once implementations.
>>
>> See this for the intial discussions on the unified pthread_once:
>> https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
>>
>> The goal is to provide a single and correct C implementation of
>> pthread_once. Architectures can then build on that if they need more
>> optimal implementations, but I don't encourage that and I'd rather
>> see deep discussions on how to make one unified solution where
>> possible.
>>
>> I've also just reviewed Torvald's new pthread_once microbenchmark which
>> you can use to compare your previous C implementation with the new
>> standard C implementation (measures pthread_once latency). The primary
>> use of this test is to help provide objective proof for or against the
>> i386 and x86_64 assembly implementations.
>>
>> We are not presently converting any of the machines with custom
>> implementations, but that will be a next step after testing with the
>> help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
>>
>> If we don't hear any objections we will go forward with this change
>> in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
>> and aarch64 on a single pthread_once implementation based on sparc's C
>> implementation.
>
> So far, I've seen an okay for tile, and a question about ARM.  Will, are
> you okay with the change for ARM?

>From a correctness and maintainability standpoint it looks good. I
have concerns about the performance but I will leave that call to the
respective ARM and AArch64 maintainers.

In your original post you speculate it may be possible to improve
performance on ARM:

"I'm currently also using the existing atomic_{read/write}_barrier
functions instead of not-yet-existing load_acq or store_rel functions.
I'm not sure whether the latter can have somewhat more efficient
implementations on Power and ARM; if so, and if you're concerned about
the overhead, we can add load_acq and store_rel to atomic.h and start
using it"

It would be interesting to know how much work that would be and what
the performance improvements might be like.

> Any other objections to the updated patch that's attached?
>
>> > +   When forking the process, some threads can be interrupted during the second
>> > +   state; they won't be present in the forked child, so we need to restart
>> > +   initialization in the child.  To distinguish an in-progress initialization
>> > +   from an interrupted initialization (in which case we need to reclaim the
>> > +   lock), we look at the fork generation that's part of the second state: We
>> > +   can reclaim iff it differs from the current fork generation.
>> > +   XXX: This algorithm has an ABA issue on the fork generation: If an
>> > +   initialization is interrupted, we then fork 2^30 times (30b of once_control
>>
>> What's "30b?" 30 bits? Please spell it out.
>>
>> > +   are used for the fork generation), and try to initialize again, we can
>> > +   deadlock because we can't distinguish the in-progress and interrupted cases
>> > +   anymore.  */
>>
>> Would you mind filing a bug for this in the upstream bugzilla?
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=16816
>
>> It's a distinct bug from this unification work, but a valid problem.
>>
>> Can this be fixed by detecting generation counter overflow in fork
>> and failing the function call?
>
> Yes, but this would prevent us from doing more than 2^30 fork calls.
> That may not be a problem in practice -- but if so, then we won't hit
> the ABA either :)
>
>> > +      do
>> > +   {
>> > +     /* Check if the initialization has already been done.  */
>> > +     if (__builtin_expect ((val & 2) != 0, 1))
>>
>> Use __glibc_likely.
>>
>> e.g. if (__glibc_likely ((val & 2) != 0))
>>
>> This is the fast path that we are testing for in the microbenchmark?
>
> Yes.
>
>> > +       return 0;
>> > +
>> > +     oldval = val;
>> > +     /* We try to set the state to in-progress and having the current
>> > +        fork generation.  We don't need atomic accesses for the fork
>> > +        generation because it's immutable in a particular process, and
>> > +        forked child processes start with a single thread that modified
>> > +        the generation.  */
>> > +     newval = __fork_generation | 1;
>>
>> OT: I wonder if Valgrind will report a benign race in accessing __fork_generation.
>
> Perhaps.  I believe that eventually, lots of this and similar variables
> should be atomic-typed and/or accessed with relaxed-memory-order atomic
> loads.  This would clarify that we expect concurrent accesses and that
> they don't constitute a data race.
>
>
> [BZ #15215]
> * nptl/sysdeps/unix/sysv/linux/sparc/pthread_once.c: Moved to ...
> * nptl/sysdeps/unix/sysv/linux/pthread_once.c: ... here.  Add missing
> memory barriers.  Add comments.
> * sysdeps/unix/sysv/linux/aarch64/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/arm/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/ia64/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/m68k/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/mips/nptl/pthread_once.c: Remove file.
> * sysdeps/unix/sysv/linux/tile/nptl/pthread_once.c: Remove file.
>
> Changelog.hppa:
>         [BZ #15215]
>         * sysdeps/unix/sysv/linux/hppa/nptl/pthread_once.c: Remove file.
>



-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]