cygwin 3.5.4-1: signal handling destroys 'long double' values

Mon Oct 14 05:59:40 GMT 2024

On Mon, 14 Oct 2024 14:29:58 +0900
Takashi Yano wrote:
> Hi Brian,
> 
> Thanks for the detail expression.
> 
> On Sun, 13 Oct 2024 16:19:31 -0600
> Brian Inglis wrote:
> > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> > > Hi Brian
> > > 
> > > On Sun, 13 Oct 2024 10:41:58 -0600
> > > Brian Inglis wrote:
> > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
> > >>> Hi Brian,
> > >>>
> > >>> On Tue, 8 Oct 2024 10:37:14 -0600
> > >>> Brian Inglis wrote:
> > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
> > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
> > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
> > >>>>>> Christian Franke wrote:
> > >>>>>>> $ gcc -o sigtest -O2 sigtest.c
> > >>>>>>>
> > >>>>>>> $ ./sigtest > out.txt
> > >>>>>>> (press ^C 42x :-)
> > >>>>>>>
> > >>>>>>> $ sort out.txt | uniq -c
> > >>>>>>>           3 x = 0x1.23456789p+0, y = -nan, d = -nan
> > >>>>>>>           6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
> > >>>>>>>          33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
> > >>>>>>>
> > >>>>>>> The problem also occurs if compiled without -O2, but less often. No
> > >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long
> > >>>>>>> double' is affected.
> > >>>>>>
> > >>>>>> Thanks for the report. I looked into this problem and might find the
> > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal
> > >>>>>> handler caller (sigfe.s) which stores/restores the registers.
> > >>>>>>
> > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
> > >>>>>> however, fninit instruction destroys some status registers in FPU (x87).
> > >>>>>>
> > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
> > >>>>>> However, I'm not familiar with x87 instructions, so I may overlook
> > >>>>>> something.
> > >>>>>>
> > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
> > >>>>>> comments?
> > >>>>>
> > >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as
> > >>>>> current systems do more and use more than the legacy x87 instructions and stack.
> > >>>>>
> > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
> > >>>>> modern approaches.
> > >>>>>
> > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details.
> > >>>>
> > >>>> This is basically what ISTR:
> > >>>>
> > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
> > >>>>
> > >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
> > >>>> SSE... instructions and XMM registers are used.
> > >>>
> > >>> Thanks for the advice. I read throuh the web pages and related documents
> > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
> > >>> cygwin-patches@cygwin.com mailing list.
> > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
> > >>>
> > >>> Is this as you intended?
> > >>
> > >> That seems to be the preferred approach now, as long as you can correctly
> > >> determine adequate space for fxsave and xsave, given the varying feature sets,
> > >> register counts, and register sizes of recent processors:
> > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.
> > > 
> > > Thanks for checking.
> > > 
> > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> > > fxsave uses 512 bytes fixed length memory to save the current
> > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
> > > 
> > > The patch allocates 0x238 bytes:
> > >   0x200 (512 bytes): fxsave area
> > >   0x008 (  8 bytes): for 16-byte alignment
> > >   0x010 ( 16 bytes): work area
> > >   0x020 ( 32 bytes): reserved for later processing
> > 
> > That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
> > Please also note that 64 bit operands or REX prefix must be used with 
> > FXSAVE/FXRSTOR to save expanded state rather than legacy state.
> 
> Fixed.
> 
> > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> > > cpuid instruction with eax=0dh and ecs=00h returns the maximum
> > > size required by xsave in ebx. So the patch allocates:
> > > ebx + 0x048 bytes.
> > >   0x018 ( 24 bytes): for 64-byte alignment
> > >   0x010 ( 16 bytes): work area
> > >   0x020 ( 32 bytes): reserved for later processing
> > 
> > That is for features currently enabled in XCR0 user state, not all the values of 
> > all possible registers, for all possible features, in ecx, which are supported, 
> > may be enabled, and in use.
> > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual 
> > features may require more.
> 
> Do you mean we should use ecx value rather than ebx returned by
> cpuid (eax=0dh,ecx=0)? I did not understand difference of the
> values of ebx and ecx returned by cpuid.
> 
> Fixed.

On the second thought, it is not necessary to use the ecx value
because the patch uses the EDX:EAX value of cpuid(0d,0) for xsave,
is it? This means that only features enabled in XCR0 are saved.
The features not enabed in XCR0 cannot be used in user mode, so
we do not need to store the states for them.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>