cygwin 3.5.4-1: signal handling destroys 'long double' values
Takashi Yano
takashi.yano@nifty.ne.jp
Mon Oct 14 05:29:58 GMT 2024
Hi Brian,
Thanks for the detail expression.
On Sun, 13 Oct 2024 16:19:31 -0600
Brian Inglis wrote:
> On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> > Hi Brian
> >
> > On Sun, 13 Oct 2024 10:41:58 -0600
> > Brian Inglis wrote:
> >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
> >>> Hi Brian,
> >>>
> >>> On Tue, 8 Oct 2024 10:37:14 -0600
> >>> Brian Inglis wrote:
> >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
> >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
> >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
> >>>>>> Christian Franke wrote:
> >>>>>>> $ gcc -o sigtest -O2 sigtest.c
> >>>>>>>
> >>>>>>> $ ./sigtest > out.txt
> >>>>>>> (press ^C 42x :-)
> >>>>>>>
> >>>>>>> $ sort out.txt | uniq -c
> >>>>>>> 3 x = 0x1.23456789p+0, y = -nan, d = -nan
> >>>>>>> 6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
> >>>>>>> 33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
> >>>>>>>
> >>>>>>> The problem also occurs if compiled without -O2, but less often. No
> >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long
> >>>>>>> double' is affected.
> >>>>>>
> >>>>>> Thanks for the report. I looked into this problem and might find the
> >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal
> >>>>>> handler caller (sigfe.s) which stores/restores the registers.
> >>>>>>
> >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
> >>>>>> however, fninit instruction destroys some status registers in FPU (x87).
> >>>>>>
> >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
> >>>>>> However, I'm not familiar with x87 instructions, so I may overlook
> >>>>>> something.
> >>>>>>
> >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
> >>>>>> comments?
> >>>>>
> >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as
> >>>>> current systems do more and use more than the legacy x87 instructions and stack.
> >>>>>
> >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
> >>>>> modern approaches.
> >>>>>
> >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details.
> >>>>
> >>>> This is basically what ISTR:
> >>>>
> >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
> >>>>
> >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
> >>>> SSE... instructions and XMM registers are used.
> >>>
> >>> Thanks for the advice. I read throuh the web pages and related documents
> >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
> >>> cygwin-patches@cygwin.com mailing list.
> >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
> >>>
> >>> Is this as you intended?
> >>
> >> That seems to be the preferred approach now, as long as you can correctly
> >> determine adequate space for fxsave and xsave, given the varying feature sets,
> >> register counts, and register sizes of recent processors:
> >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.
> >
> > Thanks for checking.
> >
> > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> > fxsave uses 512 bytes fixed length memory to save the current
> > state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
> >
> > The patch allocates 0x238 bytes:
> > 0x200 (512 bytes): fxsave area
> > 0x008 ( 8 bytes): for 16-byte alignment
> > 0x010 ( 16 bytes): work area
> > 0x020 ( 32 bytes): reserved for later processing
>
> That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
> Please also note that 64 bit operands or REX prefix must be used with
> FXSAVE/FXRSTOR to save expanded state rather than legacy state.
Fixed.
> > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> > cpuid instruction with eax=0dh and ecs=00h returns the maximum
> > size required by xsave in ebx. So the patch allocates:
> > ebx + 0x048 bytes.
> > 0x018 ( 24 bytes): for 64-byte alignment
> > 0x010 ( 16 bytes): work area
> > 0x020 ( 32 bytes): reserved for later processing
>
> That is for features currently enabled in XCR0 user state, not all the values of
> all possible registers, for all possible features, in ecx, which are supported,
> may be enabled, and in use.
> You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual
> features may require more.
Do you mean we should use ecx value rather than ebx returned by
cpuid (eax=0dh,ecx=0)? I did not understand difference of the
values of ebx and ecx returned by cpuid.
Fixed.
> It may be conservative, but I would suggest allocating the space in ecx as
> documented, just in case of future changes, and that can be reduced to 512 if
> only fxsave is supported.
> I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to
> fnsave/frstor if not, and keep everything aligned to 64 bytes for safety.
According to my survay, all Intel and AMD CPUs (means all x86 CPUs)
have fxsave/fxrstor. So we do not need to check bit 24, do we?
> For my AMD A10-9700 /proc/cpuinfo shows:
>
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
> rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid
> aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes
> *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
> misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm
> perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep
> bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
> decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov
>
> and /usr/bin/cpuid (package cpuid) shows (see my added !):
>
> ...
> feature information (1/edx):
> x87 FPU on chip = true
> VME: virtual-8086 mode enhancement = true
> DE: debugging extensions = true
> PSE: page size extensions = true
> TSC: time stamp counter = true
> RDMSR and WRMSR support = true
> PAE: physical address extensions = true
> MCE: machine check exception = true
> CMPXCHG8B inst. = true
> APIC on chip = true
> SYSENTER and SYSEXIT = true
> MTRR: memory type range registers = true
> PTE global bit = true
> MCA: machine check architecture = true
> CMOV: conditional move/compare instr = true
> PAT: page attribute table = true
> PSE-36: page size extension = true
> PSN: processor serial number = false
> CLFLUSH instruction = true
> DS: debug store = false
> ACPI: thermal monitor and clock ctrl = false
> MMX Technology = true
> ! FXSAVE/FXRSTOR = true
> SSE extensions = true
> SSE2 extensions = true
> SS: self snoop = false
> hyper-threading / multi-core supported = true
> TM: therm. monitor = false
> IA64 = false
> PBE: pending break event = false
> feature information (1/ecx):
> PNI/SSE3: Prescott New Instructions = true
> PCLMULDQ instruction = true
> DTES64: 64-bit debug store = false
> MONITOR/MWAIT = true
> CPL-qualified debug store = false
> VMX: virtual machine extensions = false
> SMX: safer mode extensions = false
> Enhanced Intel SpeedStep Technology = false
> TM2: thermal monitor 2 = false
> SSSE3 extensions = true
> context ID: adaptive or shared L1 data = false
> SDBG: IA32_DEBUG_INTERFACE = false
> FMA instruction = true
> CMPXCHG16B instruction = true
> xTPR disable = false
> PDCM: perfmon and debug = false
> PCID: process context identifiers = false
> DCA: direct cache access = false
> SSE4.1 extensions = true
> SSE4.2 extensions = true
> x2APIC: extended xAPIC support = false
> MOVBE instruction = true
> POPCNT instruction = true
> time stamp counter deadline = false
> AES instruction = true
> XSAVE/XSTOR states = true
> ! OS-enabled XSAVE/XSTOR = true
> AVX: advanced vector extensions = true
> F16C half-precision convert instruction = true
> RDRAND instruction = true
> hypervisor guest status = false
> ...
> XSAVE features (0xd/0):
> XCR0 valid bit field mask = 0x4000000000000007
> x87 state = true
> SSE state = true
> AVX state = true
> MPX BNDREGS = false
> MPX BNDCSR = false
> AVX-512 opmask = false
> AVX-512 ZMM_Hi256 = false
> AVX-512 Hi16_ZMM = false
> PKRU state = false
> XTILECFG state = false
> XTILEDATA state = false
> bytes required by fields in XCR0 = 0x00000340 (832)
Is this ebx
> ! bytes required by XSAVE/XRSTOR area = 0x000003c0 (960)
and is this ecx from cpuid (0d:0)? I had checked some of my
environments, but ebx and ecx had always the same value. So,
I thought either can be used...
Please check v2 patch.
--
Takashi Yano <takashi.yano@nifty.ne.jp>
More information about the Cygwin
mailing list