This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On 28-03-2014 20:29, Carlos O'Donell wrote:
> David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
>
> We would like to unify all C-based pthread_once implmentations
> per the plan in bug 15215 for glibc 2.20.
>
> Your machines are on the list of C-based pthread_once implementations.
>
> See this for the intial discussions on the unified pthread_once:
> https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
>
> The goal is to provide a single and correct C implementation of 
> pthread_once. Architectures can then build on that if they need more 
> optimal implementations, but I don't encourage that and I'd rather
> see deep discussions on how to make one unified solution where
> possible.
>
> I've also just reviewed Torvald's new pthread_once microbenchmark which
> you can use to compare your previous C implementation with the new
> standard C implementation (measures pthread_once latency). The primary
> use of this test is to help provide objective proof for or against the
> i386 and x86_64 assembly implementations.
>
> We are not presently converting any of the machines with custom
> implementations, but that will be a next step after testing with the
> help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
>
> If we don't hear any objections we will go forward with this change
> in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
> and aarch64 on a single pthread_once implementation based on sparc's C
> implementation.
>
> Any objections to this cleanup for 2.20?
>
I tested the patch and benchmark on PowerPC (POWER7) and the results looks good:

* Current code:
"duration": 5.08322e+09, "iterations": 2.2037e+08, "max": 244.863, "min": 22.08, "mean": 23.0668
"duration": 5.08316e+09, "iterations": 2.20479e+08, "max": 771.08, "min": 21.984, "mean": 23.0551
"duration": 5.08306e+09, "iterations": 2.21093e+08, "max": 53.966, "min": 22.052, "mean": 22.9906
"duration": 5.0833e+09, "iterations": 2.20062e+08, "max": 347.895, "min": 22.15, "mean": 23.0994
"duration": 5.08277e+09, "iterations": 2.20699e+08, "max": 632.479, "min": 21.997, "mean": 23.0303

* Optimization:
"duration": 4.97694e+09, "iterations": 8.42834e+08, "max": 134.181, "min": 5.405, "mean": 5.90501
"duration": 4.9758e+09, "iterations": 8.66952e+08, "max": 29.163, "min": 5.405, "mean": 5.73941
"duration": 4.9778e+09, "iterations": 8.51788e+08, "max": 40.819, "min": 5.405, "mean": 5.84394
"duration": 4.97413e+09, "iterations": 8.52432e+08, "max": 37.089, "min": 5.488, "mean": 5.83523
"duration": 4.97795e+09, "iterations": 8.43376e+08, "max": 163.703, "min": 5.405, "mean": 5.90241

I am running on a 18 cores machine, so I guess the 'max' is due a timing issue from os jitter.
Digging up on current code and the unified one, I noted PowerPC one is currently doing load+and+store
condition followed by a full barrier.  This is overkill for some scenarios: if the initialization is
done (val & 2) we don't need a full barrier (isync), since the requirement (as Tovalds has explained)
is just a load acquire.

I ran make check and results looks good. I also checked with GCC issues that originated the fix
that added the release barrier in the code (gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10) and
it does not show the issue with new implementation.

Finally, I also check if by replacing the 'atomic_read_barrier' and 'atomic_write_barrier' with
a load acquire and load release on POWER would shows any difference. The result are not conclusive:

"duration": 4.97196e+09, "iterations": 8.79836e+08, "max": 22.874, "min": 5.405, "mean": 5.651
"duration": 4.97144e+09, "iterations": 8.55294e+08, "max": 270.72, "min": 5.4, "mean": 5.81256
"duration": 4.97496e+09, "iterations": 8.67163e+08, "max": 27.274, "min": 5.405, "mean": 5.73705
"duration": 4.97603e+09, "iterations": 8.61568e+08, "max": 41.631, "min": 5.405, "mean": 5.77556
"duration": 4.97347e+09, "iterations": 8.54189e+08, "max": 41.457, "min": 5.405, "mean": 5.82244


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]