This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] S/390: Fix two issues with the IFUNC optimized mem* routines
- From: Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 06 Sep 2012 15:09:09 +0200
- Subject: Re: [PATCH] S/390: Fix two issues with the IFUNC optimized mem* routines
- References: <20120829104421.GA27985@bart> <503F7FCA.40503@twiddle.net> <503F8328.20703@linux.vnet.ibm.com> <503F95A9.8030604@twiddle.net>
On 30/08/12 18:32, Richard Henderson wrote:
> On 08/30/2012 08:13 AM, Andreas Krebbel wrote:
>>> There's no reason to *ever* drop a hidden function pointer
>>> into the constant pool, since LARL is always available.
>>
>> The problem was that I unfortunately missed to mark the references
>> hidden in a way visible to GCC :(
>
> Ah, well, that'll do it every time. ;-)
>
> All that said, it's probably reasonable have the linker transform
>
> lg r1,x@got(b2)
> or
> lgrl r1,x@got
> into
> larl r1,x
>
> when it finds that X has appropriate alignment and has been made
> hidden via linker maps or other mechanisms not visible to the compiler.
Yes. I'll work on this. Unfortunately it will not cover the currently most common case -march <z10
and -fPIC. In that case we do:
larl %r1,foo@GOTENT
lg %r1,0(%r1)
But starting with z10 that can easily be optimized as you say. While looking at it I noticed that we
currently don't use load relative for GOT slots. Fixed with this patch:
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00088.html
> The use of lgrl would be an enhancement to gcc for use on z10+ to
> completely eliminate the got register. Freeing up another register
> definitely seems like a useful optimization.
I tried this when implementing the z10 support. Unfortunately in most cases using a GOT pointer is
faster since all the instructions support the <disp>(r12) address format and therefore can directly
deal with the GOT slot. Not having a GOT pointer requires more additional load relatives to be put
into the code.
Bye,
-Andreas-