This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)

From: Torvald Riegel <triegel at redhat dot com>
To: Will Newton <will dot newton at linaro dot org>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, David Miller <davem at davemloft dot net>, Marcus Shawcroft <marcus dot shawcroft at linaro dot org>, "Joseph S. Myers" <joseph at codesourcery dot com>, Mike Frysinger <vapier at gentoo dot org>, Andreas Schwab <schwab at linux-m68k dot org>, Steve Ellcey <sellcey at mips dot com>, Chris Metcalf <cmetcalf at tilera dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
Date: Mon, 07 Apr 2014 15:16:41 +0200
Subject: Re: [PATCH] Unify pthread_once (bug 15215)
Authentication-results: sourceware.org; auth=none
References: <1368024237 dot 7774 dot 794 dot camel at triegel dot csb> <519D97E4 dot 4030808 at redhat dot com> <1381018836 dot 8757 dot 3598 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1310071604090 dot 31470 at digraph dot polyomino dot org dot uk> <1381182784 dot 18547 dot 138 dot camel at triegel dot csb> <533605BF dot 9000005 at redhat dot com> <1396874251 dot 10643 dot 8736 dot camel at triegel dot csb> <CANu=DmjXWU5ZhMv-8_a4PgTj+p48vCDi-1RewSqvSorT_W795w at mail dot gmail dot com>

On Mon, 2014-04-07 at 13:46 +0100, Will Newton wrote:
> On 7 April 2014 13:37, Torvald Riegel <triegel@redhat.com> wrote:
> > On Fri, 2014-03-28 at 19:29 -0400, Carlos O'Donell wrote:
> >> David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
> >>
> >> We would like to unify all C-based pthread_once implmentations
> >> per the plan in bug 15215 for glibc 2.20.
> >>
> >> Your machines are on the list of C-based pthread_once implementations.
> >>
> >> See this for the intial discussions on the unified pthread_once:
> >> https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
> >>
> >> The goal is to provide a single and correct C implementation of
> >> pthread_once. Architectures can then build on that if they need more
> >> optimal implementations, but I don't encourage that and I'd rather
> >> see deep discussions on how to make one unified solution where
> >> possible.
> >>
> >> I've also just reviewed Torvald's new pthread_once microbenchmark which
> >> you can use to compare your previous C implementation with the new
> >> standard C implementation (measures pthread_once latency). The primary
> >> use of this test is to help provide objective proof for or against the
> >> i386 and x86_64 assembly implementations.
> >>
> >> We are not presently converting any of the machines with custom
> >> implementations, but that will be a next step after testing with the
> >> help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
> >>
> >> If we don't hear any objections we will go forward with this change
> >> in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
> >> and aarch64 on a single pthread_once implementation based on sparc's C
> >> implementation.
> >
> > So far, I've seen an okay for tile, and a question about ARM.  Will, are
> > you okay with the change for ARM?
> 
> From a correctness and maintainability standpoint it looks good. I
> have concerns about the performance but I will leave that call to the
> respective ARM and AArch64 maintainers.
> 
> In your original post you speculate it may be possible to improve
> performance on ARM:
> 
> "I'm currently also using the existing atomic_{read/write}_barrier
> functions instead of not-yet-existing load_acq or store_rel functions.
> I'm not sure whether the latter can have somewhat more efficient
> implementations on Power and ARM; if so, and if you're concerned about
> the overhead, we can add load_acq and store_rel to atomic.h and start
> using it"
> 
> It would be interesting to know how much work that would be and what
> the performance improvements might be like.

I had a quick look at the arm and aarch64 barrier definitions, and they
only define a full barrier, but not separate read / write barriers.
That is part of the performance problem I believe, since a full barrier
should be significantly more costly than an acquire barrier.

I guess read/write barriers as used in glibc are semantically equivalent
to acquire / release as in C11, but I'm not quite sure given that some
architectures use stronger barriers for read/write than acquire/release.
Cleaning that up would require review of plenty of code.  But one could
start incrementally as well by not changing existing barrier definitions
and reviewing uses one by one.  In the long term, I think we would
benefit from using C11 atomics throughout glibc; in some cases, existing
custom assembly might be faster (e.g., that has been one comment
regarding, IIRC, powerpc low-level locks) -- but maybe we can achieve
this with custom memory orders for atomics as well, or something
similar.
In any way, cleaning this up is not specific to pthread_once.

Second, suggested mappings from C11 acquire/release to arm
(http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html) show differences
for acquire loads and acquire barriers, but I don't know whether these
would result in a performance difference.

I'd appreciate input from architecture maintainers, especially from
those maintaining archs with weaker memory models such as arm.

Follow-Ups:
- Re: [PATCH] Unify pthread_once (bug 15215)
  - From: Will Newton

References:
- Re: [PATCH] Unify pthread_once (bug 15215)
  - From: Torvald Riegel
- Re: [PATCH] Unify pthread_once (bug 15215)
  - From: Will Newton

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]