This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: CPU dispatching in libc


Ryan S. Arnold wrote:
>Agner wrote:
>> Does such a CPU dispatching exist in libc? How does it work? It should
>>be possible to compile a static binary on a system with SSE-whatever,
>>and run it on a system with SSE-something-else. Therefore, I want the
>>CPU-dispatching to be inside libc.

>We (IBM) had discussions with AMD and Intel at the 2007 GCC Summit where
>they indicated that they were interested in dynamic runtime checks for
>hardware capability which would route the application to the correct CPU
>optimized function implementation while the application was running by
>using a first-time-called hwcap check.
>ïThe 'first-time-called' hwcap check would work by having a wrapper
>function check to see if it had an internal function pointer set for an
>optimized version of the function. If not, then it'd check the hwcap
>for the specific platform information, find the correct function pointer
>and set it. Subsequent calls wouldn't pay this resolution
>penalty. ïI'm not sure if they made any progress on this. H.J. Lu at
>Intel would probably be able to tell you.

The framework for CPU dispatching must be in place before any progress can be made. So this is the reason why the memory and string functions are so slow in libc. What are you doing with math functions? Most other libraries use SSE2 for math functions if available. I can't find the math functions in libc, so I don't know what you are doing here.

>You should contact H.J Lu (via email and CC this mailing list) and ask
>him if they made any progress with their 'first-time-called'
>optimization checks idea.

I have CC'ed this mail to him.

If CPU dispatching is not implemented yet, here is my proposal for an efficient mechanism:
The function entry has JMP POINTER where POINTER is a pointer stored in the data segment.
POINTER initially points to a dispatcher. The dispatcher calls a function WhichInstructionSetDoIHave. According to the value received, it changes POINTER to point to the optimal version of the code. Then jumps to [POINTER]. The next time the function is called, it goes through POINTER directly to the optimal version. The cost of dispatching is then just one single instruction, except for the first time. (A 32-bit position-independent version needs to get a reference thunk into ecx first).


The most probable path should be immediately after JMP POINTER.

The WhichInstructionSetDoIHave function reads its value from a variable CurrentInstructionSet in the data segment. This variable is initially zero, indicating that it must use CPUID etc. to determine the instruction set. It is possible to detect whether XMM registers are enabled by using the FXSAVE/FXRSTOR instructions rather than asking the operating system or catching an exception. This will make it easier to port libc to different operating systems.

For testing purposes, it should be possible to change the value of CurrentInstructionSet. Set it to a lower value for testing older versions, set it to a higher value for testing new versions if you have an emulator for that instruction set.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]