This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On Wed, May 08, 2013 at 10:47:26PM +0200, Torvald Riegel wrote:
> On Wed, 2013-05-08 at 13:51 -0400, Rich Felker wrote:
> > On Wed, May 08, 2013 at 04:43:57PM +0200, Torvald Riegel wrote:
> > > Note that this will make a call to pthread_once that doesn't need to
> > > actually run the init routine slightly slower due to the additional
> > > acquire barrier.  If you're really concerned about this overhead, speak
> > > up.  There are ways to avoid it, but it comes with additional complexity
> > > and bookkeeping.
> > 
> > On the one hand, I think it should be avoided if at all possible.
> > pthread_once is the correct, canonical way to do initialization (as
> > opposed to hacks like library init functions or global ctors), and the
> > main doubt lots of people have about doing it the correct way is that
> > they're going to kill performance if they call pthread_once from every
> > point where initialization needs to have been completed. If every call
> > imposes memory synchronization, performance might become a real issue
> > discouraging people from following best practices for library
> > initialization.
> 
> Well, what we precisely need is that the initialization happens-before
> (ie, the relation from the, say, C11 memory model) every call that does
> not in fact initialize.  If initialization happened on another thread,
> you need to synchronize.  But from there on, you are essentially free to
> establish this in any way you want.  And there are ways, because
> happens-before is more-or-less transitive.
> 
> > On the other hand, I don't think it's conforming to elide the barrier.
> > POSIX states (XSH 4.11 Memory Synchronization):
> > 
> > "The pthread_once() function shall synchronize memory for the first
> > call in each thread for a given pthread_once_t object."
> 
> No, it's not.  You could see just parts of the effects of the
> initialization; potentially reading garbage can't be the intended
> semantics :)

The work of synchronizing memory should take place at the end of the
pthread_once call that actually does the initialization, rather than
in the other threads which synchronize. This is the way the x86 memory
model naturally works, but perhaps it's prohibitive to achieve on
other architectures. However, the idea is that pthread_once only runs
init routines a small finite number of times, so even if you had to so
some horrible hack that makes the synchronization on return 1000x
slower (e.g. a syscall), it would still be better than incurring the
cost of a full acquire barrier in each subsequent call, which ideally
should have the same cost as a call to an empty function.

> > Since it's impossible to track whether a call is the first call in a
> > given thread
> 
> Are you sure about this? :)

It's impossible with bounded memory requirements, and thus impossible
in general (allocating memory for the tracking might fail).

> > this means every call to pthread_once() is required to
> > be a full memory barrier.
> 
> Note that we do not need a full memory barrier, just an acquire memory
> barrier.  So this only matters on architectures with memory models that
> give weaker per-default ordering guarantees.  For example, this doesn't
> add any hardware barrier instructions on x86 or Sparc TSO.  But for
> Power and ARM it does.

Yes, I see that.

> > I suspect this is unintended, and we should
> > perhaps file a bug report with the Austin Group and see if the
> > requirement can be relaxed.
> 
> I don't think that other semantics are intended.  If you return from
> pthread_once(), initialization should have happened before that.  If it
> doesn't, you don't really know whether initialization happened once, so
> programs would be forced to do their own synchronization.

I think my confusion is merely that POSIX does not define the phrase
"synchronize memory", and in the absence of a definition, "full memory
barrier" (both release and acquire semantics) is the only reasonable
interpretation I can find. In other words, it seems like a
pathological conforming program could attempt to use the language in
the specification to use pthread_once as a release barrier. I'm not
sure if there are ways this could be meaningfully arranged (i.e. with
well-defined ordering; off-hand, I would think tricks with cancelling
an in-progress invocation of pthread_once might make it possible.

By the way, cancellation probably makes the above POSIX text incorrect
anyway; a thread could call pthread_once on the same pthread_once_t
object more than once, with the second call not being a no-op, if the
initialization routine for the first call is cancelled and the second
call takes place from a cancellation cleanup handler.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]