cygwin 3.5.4-1: signal handling destroys 'long double' values

Sun Oct 13 22:19:31 GMT 2024

On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> Hi Brian
> 
> On Sun, 13 Oct 2024 10:41:58 -0600
> Brian Inglis wrote:
>> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
>>> Hi Brian,
>>>
>>> On Tue, 8 Oct 2024 10:37:14 -0600
>>> Brian Inglis wrote:
>>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
>>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
>>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
>>>>>> Christian Franke wrote:
>>>>>>> $ gcc -o sigtest -O2 sigtest.c
>>>>>>>
>>>>>>> $ ./sigtest > out.txt
>>>>>>> (press ^C 42x :-)
>>>>>>>
>>>>>>> $ sort out.txt | uniq -c
>>>>>>>           3 x = 0x1.23456789p+0, y = -nan, d = -nan
>>>>>>>           6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
>>>>>>>          33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
>>>>>>>
>>>>>>> The problem also occurs if compiled without -O2, but less often. No
>>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long
>>>>>>> double' is affected.
>>>>>>
>>>>>> Thanks for the report. I looked into this problem and might find the
>>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal
>>>>>> handler caller (sigfe.s) which stores/restores the registers.
>>>>>>
>>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
>>>>>> however, fninit instruction destroys some status registers in FPU (x87).
>>>>>>
>>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
>>>>>> However, I'm not familiar with x87 instructions, so I may overlook
>>>>>> something.
>>>>>>
>>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
>>>>>> comments?
>>>>>
>>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as
>>>>> current systems do more and use more than the legacy x87 instructions and stack.
>>>>>
>>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
>>>>> modern approaches.
>>>>>
>>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details.
>>>>
>>>> This is basically what ISTR:
>>>>
>>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
>>>>
>>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
>>>> SSE... instructions and XMM registers are used.
>>>
>>> Thanks for the advice. I read throuh the web pages and related documents
>>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
>>> cygwin-patches@cygwin.com mailing list.
>>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
>>>
>>> Is this as you intended?
>>
>> That seems to be the preferred approach now, as long as you can correctly
>> determine adequate space for fxsave and xsave, given the varying feature sets,
>> register counts, and register sizes of recent processors:
>> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.
> 
> Thanks for checking.
> 
> According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> fxsave uses 512 bytes fixed length memory to save the current
> state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
> 
> The patch allocates 0x238 bytes:
>   0x200 (512 bytes): fxsave area
>   0x008 (  8 bytes): for 16-byte alignment
>   0x010 ( 16 bytes): work area
>   0x020 ( 32 bytes): reserved for later processing

That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
Please also note that 64 bit operands or REX prefix must be used with 
FXSAVE/FXRSTOR to save expanded state rather than legacy state.

> According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> cpuid instruction with eax=0dh and ecs=00h returns the maximum
> size required by xsave in ebx. So the patch allocates:
> ebx + 0x048 bytes.
>   0x018 ( 24 bytes): for 64-byte alignment
>   0x010 ( 16 bytes): work area
>   0x020 ( 32 bytes): reserved for later processing

That is for features currently enabled in XCR0 user state, not all the values of 
all possible registers, for all possible features, in ecx, which are supported, 
may be enabled, and in use.
You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual 
features may require more.
It may be conservative, but I would suggest allocating the space in ecx as 
documented, just in case of future changes, and that can be reduced to 512 if 
only fxsave is supported.
I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to 
fnsave/frstor if not, and keep everything aligned to 64 bytes for safety.

For my AMD A10-9700 /proc/cpuinfo shows:

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid 
aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes 
*xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm 
perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep 
bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov

and /usr/bin/cpuid (package cpuid) shows (see my added !):

...
    feature information (1/edx):
       x87 FPU on chip                        = true
       VME: virtual-8086 mode enhancement     = true
       DE: debugging extensions               = true
       PSE: page size extensions              = true
       TSC: time stamp counter                = true
       RDMSR and WRMSR support                = true
       PAE: physical address extensions       = true
       MCE: machine check exception           = true
       CMPXCHG8B inst.                        = true
       APIC on chip                           = true
       SYSENTER and SYSEXIT                   = true
       MTRR: memory type range registers      = true
       PTE global bit                         = true
       MCA: machine check architecture        = true
       CMOV: conditional move/compare instr   = true
       PAT: page attribute table              = true
       PSE-36: page size extension            = true
       PSN: processor serial number           = false
       CLFLUSH instruction                    = true
       DS: debug store                        = false
       ACPI: thermal monitor and clock ctrl   = false
       MMX Technology                         = true
!     FXSAVE/FXRSTOR                         = true
       SSE extensions                         = true
       SSE2 extensions                        = true
       SS: self snoop                         = false
       hyper-threading / multi-core supported = true
       TM: therm. monitor                     = false
       IA64                                   = false
       PBE: pending break event               = false
    feature information (1/ecx):
       PNI/SSE3: Prescott New Instructions     = true
       PCLMULDQ instruction                    = true
       DTES64: 64-bit debug store              = false
       MONITOR/MWAIT                           = true
       CPL-qualified debug store               = false
       VMX: virtual machine extensions         = false
       SMX: safer mode extensions              = false
       Enhanced Intel SpeedStep Technology     = false
       TM2: thermal monitor 2                  = false
       SSSE3 extensions                        = true
       context ID: adaptive or shared L1 data  = false
       SDBG: IA32_DEBUG_INTERFACE              = false
       FMA instruction                         = true
       CMPXCHG16B instruction                  = true
       xTPR disable                            = false
       PDCM: perfmon and debug                 = false
       PCID: process context identifiers       = false
       DCA: direct cache access                = false
       SSE4.1 extensions                       = true
       SSE4.2 extensions                       = true
       x2APIC: extended xAPIC support          = false
       MOVBE instruction                       = true
       POPCNT instruction                      = true
       time stamp counter deadline             = false
       AES instruction                         = true
       XSAVE/XSTOR states                      = true
!     OS-enabled XSAVE/XSTOR                  = true
       AVX: advanced vector extensions         = true
       F16C half-precision convert instruction = true
       RDRAND instruction                      = true
       hypervisor guest status                 = false
...
    XSAVE features (0xd/0):
       XCR0 valid bit field mask               = 0x4000000000000007
          x87 state                            = true
          SSE state                            = true
          AVX state                            = true
          MPX BNDREGS                          = false
          MPX BNDCSR                           = false
          AVX-512 opmask                       = false
          AVX-512 ZMM_Hi256                    = false
          AVX-512 Hi16_ZMM                     = false
          PKRU state                           = false
          XTILECFG state                       = false
          XTILEDATA state                      = false
       bytes required by fields in XCR0        = 0x00000340 (832)
!     bytes required by XSAVE/XRSTOR area     = 0x000003c0 (960)
       XSAVEOPT instruction                    = true
       XSAVEC instruction                      = false
       XGETBV instruction                      = false
       XSAVES/XRSTORS instructions             = false
       XFD: extended feature disable supported = false
       SAVE area size in bytes                 = 0x00000000 (0)
       IA32_XSS valid bit field mask           = 0x0000000000000000
          PT state                             = false
          PASID state                          = false
          CET_U user state                     = false
          CET_S supervisor state               = false
          HDC state                            = false
          UINTR state                          = false
          LBR state                            = false
          HWP state                            = false
    AVX/YMM features (0xd/2):
       AVX/YMM save state byte size             = 0x00000100 (256)
       AVX/YMM save state byte offset           = 0x00000240 (576)
       supported in IA32_XSS or XCR0            = XCR0 (user state)
       64-byte alignment in compacted XSAVE     = false
       XFD faulting supported                   = false
    LWP features (0xd/0x3e):
       LWP save state byte size                 = 0x00000080 (128)
       LWP save state byte offset               = 0x00000340 (832)
       supported in IA32_XSS or XCR0            = XCR0 (user state)
       64-byte alignment in compacted XSAVE     = false
       XFD faulting supported                   = false
...

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry