This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Save fp registers on x64 function resolution.
- From: Andreas Jaeger <aj at suse dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 26 Jul 2013 17:37:06 +0200
- Subject: Re: [PATCH] Save fp registers on x64 function resolution.
- References: <20130726091501 dot GA7231 at domone dot kolej dot mff dot cuni dot cz>
On 07/26/2013 11:15 AM, OndÅej BÃlka wrote:
> Hi, as having to manually save xmm registers causes many issues recently
> (memset issues, bug 15786...) this patch save xmm registers. If you
> accept it to 2.19 further cleanups will follow.
The question here is whether we state that the resolver and any
functions it calls - including IFUNC resolvers - are allowed to touch
xmm registers or not. In the past, we were fine with not touching xmm
registers - but now with IFUNC, it's for sure safer to save xmm as well.
But it comes with a cost.
At minimum, we should document what IFUNCs are allowed to do and what not.
> We could also add register saving for other architectures.
>
> As performance is concerned not saving registers looks like saving at
> wrong place. It causes dl_fixup code not to use sse functions variants
> that could have bigger slowdown than what was saved by not saving
> registers.
>
> I do not have measurements yet, it would need to add rdtsc to _dl_fixup
> as it is and _dl_fixup with rtld-*.S, -mno-sse and other hacks removed.
>
> Comments?
Regarding the code itself:
We already do a "subq $56,%rsp" to reserve stack-space, so let's combine
those with the subq128 you have. This also deserves a comment why this
is needed.
Andreas
>
> diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> index 5770c64..354b17c 100644
> --- a/sysdeps/x86_64/dl-trampoline.S
> +++ b/sysdeps/x86_64/dl-trampoline.S
> @@ -42,7 +42,29 @@ _dl_runtime_resolve:
> movq %r9, 48(%rsp)
> movq 64(%rsp), %rsi # Copy args pushed by PLT in register.
> movq 56(%rsp), %rdi # %rdi: link_map, %rsi: reloc_index
> - call _dl_fixup # Call resolver.
> +
> + subq $128, %rsp
> + cfi_adjust_cfa_offset(128)
> + movdqu %xmm0, (%rsp)
> + movdqu %xmm1, 16(%rsp)
> + movdqu %xmm2, 32(%rsp)
> + movdqu %xmm3, 48(%rsp)
> + movdqu %xmm4, 64(%rsp)
> + movdqu %xmm5, 80(%rsp)
> + movdqu %xmm6, 96(%rsp)
> + movdqu %xmm7, 112(%rsp)
> + call _dl_fixup # Call resolver.
> + movdqu (%rsp), %xmm0
> + movdqu 16(%rsp), %xmm1
> + movdqu 32(%rsp), %xmm2
> + movdqu 48(%rsp), %xmm3
> + movdqu 64(%rsp), %xmm4
> + movdqu 80(%rsp), %xmm5
> + movdqu 96(%rsp), %xmm6
> + movdqu 112(%rsp), %xmm7
> + addq $128, %rsp
> + cfi_adjust_cfa_offset(-128)
> +
> movq %rax, %r11 # Save return value
> movq 48(%rsp), %r9 # Get register content back.
> movq 40(%rsp), %r8
>
--
Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃrnberg, Germany
GF: Jeff Hawn,Jennifer Guild,Felix ImendÃrffer,HRB16746 (AG NÃrnberg)
GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126