cygwin 3.5.4-1: signal handling destroys 'long double' values
Brian Inglis
Brian.Inglis@SystematicSW.ab.ca
Sun Oct 13 22:19:31 GMT 2024
On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> Hi Brian
>
> On Sun, 13 Oct 2024 10:41:58 -0600
> Brian Inglis wrote:
>> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
>>> Hi Brian,
>>>
>>> On Tue, 8 Oct 2024 10:37:14 -0600
>>> Brian Inglis wrote:
>>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
>>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
>>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
>>>>>> Christian Franke wrote:
>>>>>>> $ gcc -o sigtest -O2 sigtest.c
>>>>>>>
>>>>>>> $ ./sigtest > out.txt
>>>>>>> (press ^C 42x :-)
>>>>>>>
>>>>>>> $ sort out.txt | uniq -c
>>>>>>> 3 x = 0x1.23456789p+0, y = -nan, d = -nan
>>>>>>> 6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
>>>>>>> 33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
>>>>>>>
>>>>>>> The problem also occurs if compiled without -O2, but less often. No
>>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long
>>>>>>> double' is affected.
>>>>>>
>>>>>> Thanks for the report. I looked into this problem and might find the
>>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal
>>>>>> handler caller (sigfe.s) which stores/restores the registers.
>>>>>>
>>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
>>>>>> however, fninit instruction destroys some status registers in FPU (x87).
>>>>>>
>>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
>>>>>> However, I'm not familiar with x87 instructions, so I may overlook
>>>>>> something.
>>>>>>
>>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
>>>>>> comments?
>>>>>
>>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as
>>>>> current systems do more and use more than the legacy x87 instructions and stack.
>>>>>
>>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
>>>>> modern approaches.
>>>>>
>>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details.
>>>>
>>>> This is basically what ISTR:
>>>>
>>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
>>>>
>>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
>>>> SSE... instructions and XMM registers are used.
>>>
>>> Thanks for the advice. I read throuh the web pages and related documents
>>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
>>> cygwin-patches@cygwin.com mailing list.
>>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
>>>
>>> Is this as you intended?
>>
>> That seems to be the preferred approach now, as long as you can correctly
>> determine adequate space for fxsave and xsave, given the varying feature sets,
>> register counts, and register sizes of recent processors:
>> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.
>
> Thanks for checking.
>
> According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> fxsave uses 512 bytes fixed length memory to save the current
> state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
>
> The patch allocates 0x238 bytes:
> 0x200 (512 bytes): fxsave area
> 0x008 ( 8 bytes): for 16-byte alignment
> 0x010 ( 16 bytes): work area
> 0x020 ( 32 bytes): reserved for later processing
That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
Please also note that 64 bit operands or REX prefix must be used with
FXSAVE/FXRSTOR to save expanded state rather than legacy state.
> According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> cpuid instruction with eax=0dh and ecs=00h returns the maximum
> size required by xsave in ebx. So the patch allocates:
> ebx + 0x048 bytes.
> 0x018 ( 24 bytes): for 64-byte alignment
> 0x010 ( 16 bytes): work area
> 0x020 ( 32 bytes): reserved for later processing
That is for features currently enabled in XCR0 user state, not all the values of
all possible registers, for all possible features, in ecx, which are supported,
may be enabled, and in use.
You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual
features may require more.
It may be conservative, but I would suggest allocating the space in ecx as
documented, just in case of future changes, and that can be reduced to 512 if
only fxsave is supported.
I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to
fnsave/frstor if not, and keep everything aligned to 64 bytes for safety.
For my AMD A10-9700 /proc/cpuinfo shows:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid
aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes
*xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm
perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep
bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov
and /usr/bin/cpuid (package cpuid) shows (see my added !):
...
feature information (1/edx):
x87 FPU on chip = true
VME: virtual-8086 mode enhancement = true
DE: debugging extensions = true
PSE: page size extensions = true
TSC: time stamp counter = true
RDMSR and WRMSR support = true
PAE: physical address extensions = true
MCE: machine check exception = true
CMPXCHG8B inst. = true
APIC on chip = true
SYSENTER and SYSEXIT = true
MTRR: memory type range registers = true
PTE global bit = true
MCA: machine check architecture = true
CMOV: conditional move/compare instr = true
PAT: page attribute table = true
PSE-36: page size extension = true
PSN: processor serial number = false
CLFLUSH instruction = true
DS: debug store = false
ACPI: thermal monitor and clock ctrl = false
MMX Technology = true
! FXSAVE/FXRSTOR = true
SSE extensions = true
SSE2 extensions = true
SS: self snoop = false
hyper-threading / multi-core supported = true
TM: therm. monitor = false
IA64 = false
PBE: pending break event = false
feature information (1/ecx):
PNI/SSE3: Prescott New Instructions = true
PCLMULDQ instruction = true
DTES64: 64-bit debug store = false
MONITOR/MWAIT = true
CPL-qualified debug store = false
VMX: virtual machine extensions = false
SMX: safer mode extensions = false
Enhanced Intel SpeedStep Technology = false
TM2: thermal monitor 2 = false
SSSE3 extensions = true
context ID: adaptive or shared L1 data = false
SDBG: IA32_DEBUG_INTERFACE = false
FMA instruction = true
CMPXCHG16B instruction = true
xTPR disable = false
PDCM: perfmon and debug = false
PCID: process context identifiers = false
DCA: direct cache access = false
SSE4.1 extensions = true
SSE4.2 extensions = true
x2APIC: extended xAPIC support = false
MOVBE instruction = true
POPCNT instruction = true
time stamp counter deadline = false
AES instruction = true
XSAVE/XSTOR states = true
! OS-enabled XSAVE/XSTOR = true
AVX: advanced vector extensions = true
F16C half-precision convert instruction = true
RDRAND instruction = true
hypervisor guest status = false
...
XSAVE features (0xd/0):
XCR0 valid bit field mask = 0x4000000000000007
x87 state = true
SSE state = true
AVX state = true
MPX BNDREGS = false
MPX BNDCSR = false
AVX-512 opmask = false
AVX-512 ZMM_Hi256 = false
AVX-512 Hi16_ZMM = false
PKRU state = false
XTILECFG state = false
XTILEDATA state = false
bytes required by fields in XCR0 = 0x00000340 (832)
! bytes required by XSAVE/XRSTOR area = 0x000003c0 (960)
XSAVEOPT instruction = true
XSAVEC instruction = false
XGETBV instruction = false
XSAVES/XRSTORS instructions = false
XFD: extended feature disable supported = false
SAVE area size in bytes = 0x00000000 (0)
IA32_XSS valid bit field mask = 0x0000000000000000
PT state = false
PASID state = false
CET_U user state = false
CET_S supervisor state = false
HDC state = false
UINTR state = false
LBR state = false
HWP state = false
AVX/YMM features (0xd/2):
AVX/YMM save state byte size = 0x00000100 (256)
AVX/YMM save state byte offset = 0x00000240 (576)
supported in IA32_XSS or XCR0 = XCR0 (user state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
LWP features (0xd/0x3e):
LWP save state byte size = 0x00000080 (128)
LWP save state byte offset = 0x00000340 (832)
supported in IA32_XSS or XCR0 = XCR0 (user state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
...
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
More information about the Cygwin
mailing list