This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: [PATCH] PPC atomic.h add compare_exchange_val forms
Hi,
> I've struggled with this idea for HPPA since the architecture reference
> explicitly states that only a single lock word is allowed on the
cacheline
> (stride is 64-128 bytes wide depending on the processor).
>
> Padding to cacheline size was attempted, but static locks would have to
> pad to maximum cacheline size. This seemed to be wasteful and problematic
> for backwards binary compatibility, it also gave the linker some
headaches.
>
> Do you just live with the fact that two locks _could_ reside on the same
> cacheline?
Yes, I have always wondered the same thing. In fact, if any writes to the
same cache line (but not the same exact reserved address) also clear the
reservation, then you could certainly slow things down by accessing items
in the structure that might fall in the same cache line as the atomic_t
(for example even immediately *before* it in the case of locks)
For example, doesn't the latest linuxthreads code use the following:
struct _pthread_fastlock
{
long int __status;
int __spinlock;
}
I would guess both of these fields would use some form of load and reserve
approach on powerpc for atomic increment/decrement of the __spinlock field
versus compare_and_swap on the __status field which could theoretically
interfere with each other.
Lucikly, I do not think the __spinlock field is ever used if the platform
has compare_and_swap (then only the __status field is used)? Isn't it?
But if anythign was ever written to the __spinlock field by one thread
while another was fighting to get a clear reservation on the __status
field it could slow down progress.
So some padding around mutex_locks and atomic types might improve
performance (reduce meaningless reservation clears).
On many ppc32 machines the cache line size in 32 bytes and my quick count
of the size of pthread_mutex_t is roughly 24 bytes so having two mutex_t
in the same cache line is not possible. But this is certainly possible
for the simple atomic_t as defined in the kernel for example.
I just have no idea of the magnitude of the impact if any. I would guess
that as cache line size gets larger 128 bytes, and etc, the probability of
having multiple atomic types fall into the same cache_line gets larger and
also frequent writes to other addresses in that cache line should be more
frequent as well.
Perhaps this only impact performance slightly if at all. I don't know.
Has anyone at IBM ever measured this impact or even foudn an application
that by chance ended up in live-lock caused by having two atomic_t on the
same cache line?
Kevin