x86 CPU features detection for applications (and AMX)

Thiago Macieira thiago.macieira@intel.com
Wed Jun 30 15:36:36 GMT 2021


On Wednesday, 30 June 2021 05:50:30 PDT Enrico Weigelt, metux IT consult 
wrote:
> > No, but because it's register state and part of XSAVE, it has immediate
> > impact in ABI. In particular, the signal stack layout includes XSAVE (as
> > does ptrace()).
> 
> OMGs, I've already suspected such sickness. I don't even dare thinking
> about consequences for compilers and library ABIs.
> 
> Does anyone here know why they designed this as inline operations ? This
> thing seems to be pretty much what typical TPUs are doing (or a subset
> of it). Why not just adding a TPU next to the CPU on the same chip ?

To be clear: this is a SW ABI. It has nothing to do the presence or absence of 
other processing units in the system.

The moment you receive a Unix signal with SA_SIGINFO, the mcontext state needs 
to be saved somewhere. Where would you save it? Please remember that:

- signal handlers can be called at any point in the execution, including
  in the middle of malloc()
- signal handlers can longjmp out of the handler back into non-handler code
- in a multithreaded application, each thread can be handling a signal 
  simultaneously

We could have the kernel hold on to that and have a system call to extract 
them, but that's an ABI change and I think won't work for the longjmp case.

> > Userspace will have to do something like:
> >   - check CPUID, if !AMX -> fail
> >   - issue prctl(), if error -> fail
> >   - issue XGETBV and check the AMX bit it set, if not -> fail
> 
> Can't we to this just by prctl() call ?
> IOW: ask the kernel, who gonna say yes or no.

That's possible. The kernel can't enable an AMX state on a system without AMX.

> Are there any situations where kernel says yes, but process still can't
> use it ? Why so ?

Today there is no such case that I can think of.

> >   - request the signal stack size / spawn threads
> 
> Signal stack is separate from the usual stack, right ?
> Why can't this all be done in one shot ?

Yes, we're talking about the sigaltstack() call.

What is "this all" in the sentence above?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering





More information about the Libc-alpha mailing list