This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Save fp registers on x64 function resolution.


It's perfectly possible to be sure that the compiled code in the dynamic
linker itself doesn't touch SSE registers.  You just have to compile it
with -m options that tell the compiler to target an instruction set that
doesn't have those registers (-mno-sse, etc.).  But that's probably moot.

The dynamic linker may eventually call malloc et al, which might be
provided by the executable or another DSO with code that does touch the SSE
registers (or whatever analogous state on other machines).  Perhaps plain
PLT resolution can never lead to calling malloc, but TLSDESC calls
certainly can.  Then there's IFUNC, and LD_AUDIT.  (Have I forgotten any
other paths from dynamic linker code entered via a PLT that call out to
code outside the dynamic linker proper?)  Conceivably we could carefully
check all of these ways that control flow can leave the dynamic linker
after it's been entered through the PLT, and make them use an assembly
wrapper to save and restore the call-clobbered extra register state lazily,
only when actually leaving the "safe" zone of dynamic linker-internal code.
Off hand, that seems likely to be fragile to maintain and tedious if not
difficult to test thoroughly.

But even we wanted to do the work, then there's the question of what the
performance impact is of constraining the compiler to use less of the CPU
facilities in the dynamic linker code.  That has to be compared to the
overhead of PLT resolutions doing the save/restore (also in light of the
vastly simpler maintenance burden of the eager-saving approach).

If we go the eager route (which seems like the path of least resistance),
then for machines where there is some relevant register state that is only
conditionally available at runtime, it may well be the better trade-off not
to do the HWCAP check in _dl_runtime_resolve itself, but instead assemble
multiple variant _dl_runtime_resolve routines (with heavy use of gas
macros, one presumes) and choose once at startup which routine to point
each GOT[2] at.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]