This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [PATCH] correction for PPC __compare_and_swap


On Fri, 4 May 2001 brianmc1@us.ibm.com wrote:

> This patch corrects an error in PPC __compare_and_swap.
>
> An isync is necessary after acquisition of a lock to discard all prefetched
> instructions.  On page 335 of the The PowerPC Architecture:  A
> Specification For A New Family Of RISC Processors book it states the
> following:  The "sync" instruction is execution synchronizing.  It is not
> context synchronizing, and therefore need not discard prefetched
> instructions.  For context synchronization you can see page 371 where the
> following instructions rfi, sc and isync can be used.   End of quote.
>
> What can happen is the processor could speculative load values into
> registers as it is acquiring the lock and there is an opportunity to have
> fetched stale data because another processor still owns the lock and is
> modifying data that is protected by the lock.  The processor that is trying
> to acquire the lock has speculatively loaded the data the other processor
> is modifying.  The processor finally succeeds in acquiring the lock and
> continues on with the data it had already loaded.  The sync at the end does
> not cause the prefetched data to be discarded.  The isync causes all the
> speculative execution to be thrown away and re-executed.

If you were to implement the macros READ_MEMORY_BARRIER,
WRITE_MEMORY_BARRIER and MEMORY_BARRIER for the PowerPC, how would you
assign the instructions to these? It seems that isync resembles a read
barrier, whereas sync is more like a write barrier that yet allows
stale reads.  How about a full barrier?

> Therefore if written as separate routines then there would only be one sync
> and one isync per lock/unlock pair which will give better performance and
> thus better scalability.

How about having no barriers at all in __compare_and_swap and let
the caller take care of all the memory synchronization?

It's probably best to assume that compare_and_swap() has no
synchronizing properties, only atomic access to one location that is
not ordered with respect to any other. The user of compare_and_swap()
should always use memory barrier macros as appropriate.  Even
compare_and_swap_with_release_semantics can't be counted on to do
anything particular because it's just mapped to compare_and_swap where
not available.

The MEMORY_BARRIER() macro should provide a full fence that no memory
accesses (read or write) can cross.

The WRITE_MEMORY_BARRIER() macro should provides, at the very least,  a
write-write fence, for use in situations like ensuring that the update
to a list node is flushed before the pointer is linked into a list.
I.e. a situation with no read dependencies.

The READ_MEMORY_BARRIER() macro provides, at the very least, a fence
against stale reads, useful in ensuring that dependent reads access
coherent data: example, loading a pointer from one location and then
dereferencing it to gain access to the referenced location should be
divided by a READ_MEMORY_BARRIER().

If no specialized read or write barrier is available, the
corresponding macro is just mapped to MEMORY_BARRIER().

With these macros, the synchronization code in functions like
__pthread_lock() can be constructed to do what is needed without
depending on compare_and_swap() to have built-in fences.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]