This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PowerPC: libc single-thread lock optimization

From: Torvald Riegel <triegel at redhat dot com>
To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
Cc: libc-alpha at sourceware dot org
Date: Tue, 29 Apr 2014 18:22:23 +0200
Subject: Re: PowerPC: libc single-thread lock optimization
Authentication-results: sourceware.org; auth=none
References: <5343F8F1 dot 4000400 at linux dot vnet dot ibm dot com> <535ECADE dot 2050004 at linux dot vnet dot ibm dot com> <20140428214938 dot 3B10F2C3A13 at topped-with-meat dot com> <535ED72A dot 5060203 at linux dot vnet dot ibm dot com>

On Mon, 2014-04-28 at 19:33 -0300, Adhemerval Zanella wrote:
> On 28-04-2014 18:49, Roland McGrath wrote:
> > Heretofore sysdeps/CPU/bits/atomic.h is for pure CPU-based implementations.
> > In a few cases there exists a sysdeps/unix/sysv/linux/CPU/bits/atomic.h as
> > well because it needs to use kernel support.
> >
> > This is something somewhere in between: you are not depending directly on
> > specific facilities outside the pure CPU facilities; but you are depending
> > on library infrastructure and associated assumptions that do not hold in
> > the general case of using the atomic macros in arbitrary contexts.
> > Furthermore, you are defining SINGLE_THREAD_P to depend on NPTL
> > implementation details.  IMHO neither of these things belong in a
> > sysdeps/CPU/bits/atomic.h file.
> >
> > The lowlevellock.h change doesn't have those issues, so I'd suggest you
> > send that separately and it should go in easily.
> >
> >
> > Thanks,
> > Roland
> >
> I tend to agree with you, however the x86_64 implementation (sysdeps/x86_64/bits/atomic.h)
> itself relies on NPTL definitions (the check using (offsetof (tcbhead_t, multiple_threads)))).
> And the idea of changing the atomic.h it to simplify the logic to add this optimization:
> instead of changing the macros in lowlevellock.h and other atomic usage, it implements
> the optimization on the atomic itself.

I agree with Roland that atomic.h shouldn't have the optimization; to
me, the strongest reason is that we might need atomics that actually
synchronize independently of whether we have spawned a thread or used
cancellation.  Also, having this optimization in the atomics will make
it harder to move to, say, C11 atomics; we'd have to keep the wrappers.

> I bring this about x86 because usually it is the reference implementation and sometimes puzzles
> me that copying the same idea in another platform raise architectural question.  But I concede
> that the reference itself maybe had not opted for best solution in first place.
> 
> So if I have understand correctly, is the optimization to check for single-thread and opt to
> use locks is to focused on lowlevellock solely?  If so, how do you suggest to other archs to 
> mimic x86 optimization on atomic.h primitives?  Should other arch follow the x86_64 and
> check for __libc_multiple_threads value instead?  This could be a way, however it is redundant
> in mostly way: the TCP definition already contains the information required, so there it no
> need to keep track of it in another memory reference.  Also, following x86_64 idea, it check
> for TCB header information for sysdeps/CPU/bits/atomic.h, but for __libc_multiple_threads
> in lowlevellock.h.  Which is correct guideline for other archs?

>From a synchronization perspective, I think any single-thread
optimizations belong into the specific concurrent algorithms (e.g.,
mutexes, condvars, ...)
* Doing the optimization at the lowest level (ie, the atomics) might be
insufficient because if there's indeed just one thread, then lots of
synchronization code can be a lot more simpler than just avoiding
atomics (e.g., avoiding loops, checks, ...).
* The mutexes, condvars, etc. are what's exposed to the user, so the
assumptions of whether there really no concurrency or not just make
sense there.  For example, a single-thread program can still have a
process-shared condvar, so the condvar would need to use
synchronization.
* We currently don't have other intra-process sources of concurrency
than NPTL threads, but if we get another source (e.g., due to trying to
support accelerators), we'd likely need low-level synchronization to
communicate with the other thing -- and this would be independent of
whether we have threads.

I'm wondering whether we should move the x86 atomics special case into
the concurrent algorithms.  Otherwise, C implementations of
synchronization operations that using atomics for cross-process
synchronization won't work I guess.  We currently don't have those on
x86 I believe (use of just lowlevellock should be fine), but I suppose
things won't stay this way.

Follow-Ups:
- Re: PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella

References:
- PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella
- Re: PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella
- Re: PowerPC: libc single-thread lock optimization
  - From: Roland McGrath
- Re: PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]