This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Fix x86 sqrt rounding (bug 14032)
- From: Rich Felker <dalias at aerifal dot cx>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Richard Henderson <rth at twiddle dot net>, libc-alpha at sourceware dot org
- Date: Wed, 27 Nov 2013 21:23:20 -0500
- Subject: Re: Fix x86 sqrt rounding (bug 14032)
- Authentication-results: sourceware.org; auth=none
- References: <Pine dot LNX dot 4 dot 64 dot 1311271803540 dot 7837 at digraph dot polyomino dot org dot uk> <52966555 dot 20603 at twiddle dot net> <20131127232338 dot GP24286 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1311280049090 dot 5433 at digraph dot polyomino dot org dot uk> <20131128011316 dot GR24286 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1311280156250 dot 5433 at digraph dot polyomino dot org dot uk>
On Thu, Nov 28, 2013 at 02:02:08AM +0000, Joseph S. Myers wrote:
> On Wed, 27 Nov 2013, Rich Felker wrote:
>
> > Well then pending resolution of bug 16068, this would be something of
> > a regression. It's also unfortunate that POSIX does not define the
>
> As I said, various functions already set precision temporarily; it's not
> new that libm does this.
I see. Then I agree, the issues are separate.
> > fenv functions as AS-safe, so a conforming POSIX program *cannot* do
> > what C11 recommends.
>
> I.e., POSIX programs can't use floating point in signal handlers.
This may be correct, or it may be that such implementations that
clobber the fpu state are not valid in C99/POSIX unless the signal
handling implementation fixes the state before the signal handler is
invoked. I haven't read the relevant text of the standards well enough
to know which is the case, so apologies if you or somebody else
already knows and I'm just re-raising a question with a known answer.
> Naturally I think we should document an intent that the fenv functions are
> AS-safe provided you restore the original environment before leaving the
> handler.
Agreed. And I assume the next issue of POSIX will be aligned with C11
and will also document this.
> > Anyway, I think the proper next step is comparing performance. My
> > intuition is that changing the control register is going to be a lot
> > slower than the typical path in the first patch proposed, which
> > essentially adds just an ld80 store and double store/load pair. Note
> > that a benchmark should not use the testcase values (which are
> > numerically rare and intentionally chosen to hit the double-rounding
> > issue) unless the intent is to optimize worst-case rather than
> > avergage runtime.
>
> With inputs from
> <https://sourceware.org/ml/libc-alpha/2013-10/msg00382.html> and testing
> on a Sandy Bridge Xeon:
>
> Unmodified glibc:
> sqrt(): ITERS:1.83965e+09: TOTAL:31880.5Mcy, MAX:111.358cy, MIN:8.524cy, 57704.6 calls/Mcy
>
> First patch (adjustment using C1 bit):
> sqrt(): ITERS:1.84168e+09: TOTAL:31880.6Mcy, MAX:142.333cy, MIN:8.871cy, 57768.1 calls/Mcy
>
> Second patch (temporarily changing precision):
> sqrt(): ITERS:1.84008e+09: TOTAL:31880.4Mcy, MAX:125.583cy, MIN:8.488cy, 57718.3 calls/Mcy
>
> I interpret this as meaning there is no significant performance difference
> between the approaches and no significant performance loss from these
> changes.
My inclination is to agree, but the numbers seem a bit odd. In
particular, the calls/Mcy values seem inconsistent with the MAX/MIN
cycle counts. A lower cycle count should yield more calls/Mcy, not
fewer, no? Or is this just a measurement precision error?
Rich