This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
libm function variants in glibc

From: "Joseph S. Myers" <joseph at codesourcery dot com>
To: <libc-alpha at sourceware dot org>
Cc: <olga dot kupriianova at lip6 dot fr>, <christoph dot lauter at ens-lyon dot org>
Date: Tue, 23 Jul 2013 16:45:21 +0000
Subject: libm function variants in glibc
The following are some comments on variants of libm functions in glibc, 
following reading the metalibm slides now the Cauldron slides have been 
posted to the GCC wiki.  I'm talking about libm in general, although some 
areas such as <complex.h> functions don't currently seem to be considered 
by metalibm.


1. Discussions relating to what goes in glibc libm should of course take 
place on the libc-alpha mailing list; discussions elsewhere, as referred 
to on slide 24, are not particularly relevant to establishing community 
consensus.


2. See <http://sourceware.org/ml/libc-alpha/2013-05/msg00132.html> for a 
detailed description of what I think the accuracy goals are for glibc libm 
functions at present.


3. The principle of generating function implementations, for the cases 
using polynomial approximations, automatically using metalibm, certainly 
makes sense and seems better than the existing cases where tables of 
polynomial coefficients (or other precomputed values) were probably 
computed with a range of software, not all free software, and no 
documented way to recompute or verify those values exists, and there is 
no good way to determine error bounds either.


4. A large part of libm doesn't involve polynomial approximation or other 
precomputed values at all.  Such tools as metalibm would seem to help much 
less for implementing all the new functions from TS 18661-1 (bindings to 
IEEE 754-2008), if there were interest in adding those to glibc, for 
example; the only ones for which polynomial approximation could be 
relevant at all are the formatOf sqrt variants (fsqrt, fsqrtl, dsqrtl - 
compute a square root as if to infinite precision and round once to a 
narrower type without double rounding).

(This does not mean such tools are of no use in such cases - the ability 
to produce C code with a correctness proof would certainly be of value in 
cases such as fma - look at the number of bugs I found in the various fma 
implementations when reviewing/fixing them for correctness in all cases 
including non-default rounding modes, even though the basic algorithm, 
ignoring overflow and underflow cases, was proved correct by Boldo and 
Melquiond.  But there you have the further issue that it's not quite just 
standard C being used - it's C with additional macros to ensure things get 
evaluated on the correct side of rounding-mode changes or exception tests, 
given the limitations on optimizations in GCC not respecting those things 
at present.  That issue of interactions of GCC optimizations with 
rounding-mode changes and exception tests could also apply to other code 
generated by metalibm, I suppose.)

These functions - functions that are in general fully specified for ISO C 
(+ Annex F), and for which self-contained theoretical arguments for 
correctness can be given that do not rely on exhaustive searches for worst 
cases for correct rounding - are also the most likely to have 
architecture-specific variants in glibc's sysdeps directories, because 
processors are quite likely to have instructions for some or all of them.  
But there are plenty of architecture-specific variants for other functions 
as well - quite a few for x86/x86_64, and almost the entire libm for ia64.  
Any comparison of new function implementations with old (whether for speed 
or for accuracy) needs to take the architecture-specific variants into 
account; in some cases it may be appropriate for an architecture to stop 
using an architecture-specific variant, in other cases not.


5. To the extent that *new* functions are referred to at the user API 
level - functions not currently provided by libm at all - I think we 
should be guided by ISO C API extensions from WG14; functions not in such 
API extensions are probably better added elsewhere (e.g. GSL) rather than 
speculatively provided in glibc.

I think it *would* be reasonable to add any of the functions from ISO 
24747 ("Extensions to the C Library, to Support Mathematical Special 
Functions"), although those are likely to be particularly hard to 
implement (involving many numerically sensitive cases).

It would also be reasonable to add any of the functions listed in IEEE 
754-2008 that aren't currently in the standard C library, *once a draft of 
TS 18661-4 is available to indicate what the C APIs for such functions, 
and special cases, should look like*.  (This refers to functions such as 
sinpi.  In the case of exp10, we already have it as an extension to ISO C.  
Most of these functions are straightforward to implement if you don't mind 
errors of a few ulp.)

It would also be reasonable to add any of the functions C11 reserves for 
addition to <complex.h> (we already have clog10), with the caveat that 
it's not immediately obvious what branch cuts should look like for 
clgamma.  The difficulty of implementing such functions varies.

For any new functions, it would definitely help to have them implemented 
first in MPFR/MPC for convenience in generating testcases, although I 
don't think we can require that.


6. An important axis of variation in libm function implementations - 
though not in the interface provided to users - that doesn't seem to be 
mentioned in the metalibm slides (and I couldn't find anything about it in 
the metalibm implementation either) is the choice of the type used for 
arithmetic within the implementation, as distinct from the type used for 
arguments and return values.  Thus, when implementing a float function 
(for example - similar issues apply to double functions as well), say sinf 
(and presuming float is IEEE binary32, as it always is in glibc), there 
are several choices, and metalibm would need to allow for this issue when 
being used to implement functions for glibc:

(a) Implement using float as the internal implementation type.  This is 
common in glibc, but maybe not such a good idea in reality.  For most 
architectures it's likely use of double internally is more efficient for a 
given accuracy requirement than using float internally, as it may reduce 
the need for precision-extension techniques (Dekker etc.).  In certain 
cases where float is supported in hardware but double isn't, it's 
conceivable using float internally could be better.

(b) Implement using double as the internal implementation type (with 
double+double if correct rounding is required *and* just one double isn't 
accurate enough for the given inputs).  This is likely to be appropriate 
in most cases.

(c) Implement using long double as the internal implementation type, where 
long double is the x86/ia64/m68k "extended" type (the m68k version being 
slightly different from the others).  This is indicated on 32-bit x86, 
where you simply can't use float or double as internal type reliably 
because of the workings of the x87 FPU.  (You can set and restore the 
precision control bits, and some glibc libm functions do so, but the 
handling of values with out-of-range exponent will still depend on which 
computed values GCC decides to spill to memory, so any proofs would need 
to allow for that, and setting and restoring precision control has its own 
performance cost.)  It may be indicated on some m68k systems, for similar 
reasons.  I don't know whether it's appropriate on ia64 (that probably 
depends on the performance characteristics of float / double / long double 
arithmetic, since the hardware properly supports all of those).  It's 
probably not appropriate on x86_64, since the SSE floating-point unit is 
supposed to be faster than the x87 one.  And on 32-bit x86, it may be 
appropriate to use STT_GNU_IFUNC so that functions can be implemented 
using SSE2 on hardware supporting that and only using x87 for the "long 
double" functions or for hardware without SSE2.

(d) Implement using long double as the internal implementation type, where 
long double is binary128.  Most platforms with binary128 implement it in 
software, meaning this is not the most efficient approach.  I think S/390 
may implement it in hardware, however (at least, GCC has instruction 
patterns for it; I don't know if the instructions are actually implemented 
in hardware or through kernel emulation).

(e) Implement with software floating point (using soft-fp interfaces 
directly).  Although more efficient on systems without hardware 
floating-point support than using precision-extension techniques layered 
on top of software floating point, I don't expect anyone actually to 
implement this, as users seriously concerned about efficiency of 
floating-point functions are unlikely to be using such processors.  So 
this option is just listed for completeness.


7. A persistent source of issues in glibc's libm is powerpc with IBM long 
double.  I don't know if there's anything metalibm can do there; such a 
non-IEEE type is generally problematic for lots of things that work well 
with IEEE floating-point types.  One might imagine it could be used to 
generate polynomial coefficients, even if precise error bounds can't 
usefully be proved when this type is involved.


8. 12000 function variants (80 per (operation, type) pair), as referred to 
in the metalibm slides, would clearly be absurd for glibc.  We need to 
keep the size of the libm.so binary under control.  Also, the interface 
provided by libm.so under a particular symbol name needs to be 
well-defined, not depending on how glibc is configured, and once such a 
symbol is exported we need to support it forever.

Right now, we have *two* variants for most functions:

(i) The default version, with a name such as "sin" - see the message I 
referred to under 2. above for the goals for such a function.

(ii) A version __<func>_finite, used when a user program is built with 
-ffinite-math-only.  For glibc libm, -ffinite-math-only is considered to 
imply -fno-math-errno (all this means is no errno settings on underflow, 
since -ffinite-math-only implies that no other inputs that could involve 
errno setting are valid, and ISO C leaves it implementation-defined 
whether errno is set on underflow even when (math_errhandling & 
MATH_ERRNO) is nonzero).

The relevant existing GCC options are -ffast-math (an option implying 
several others), -funsafe-math-optimizations (which itself implies a few 
others), -ffinite-math-only, -fno-math-errno, -fno-signaling-nans 
(default), -fno-rounding-math (default), -fcx-limited-range, 
-fno-trapping-math, -fno-signed-zeros, -fassociative-math, 
-freciprocal-math.

Trying to define libm functions variants for all combinations of these 
would be excessive.  I suggest that the *most* that makes sense is the 
following eight variants, and this could reasonably be cut further.

(a) Correctly rounded.  These should be named e.g. crsin following TS 
18661-4, when provided at all.  They should handle all rounding modes, 
produce accurate exceptions including "inexact", and accurate errno.

(b) -ffast-math.  This would imply no error or exception guarantees, only 
finite arguments and results need be handled, and the result is within a 
few ulp of the correct result for inputs within a few ulp of the input 
value actually passed to the function (so there could be large errors in 
numerically unstable regions).

(c) General-purpose, with goals as described before in 2.; these would get 
the linkage names such as "sin", and would handle all rounding modes and 
errno and exception (except "inexact") setting.  If automatically 
generated with error guarantees, say < 1ulp in round-to-nearest and < 2ulp 
in other modes.  The same goals would apply for all the following variants 
with the implied limitations (-ffinite-math-only means no need to handle 
non-finite inputs or outputs or to set errno on underflow; -fno-math-errno 
means no need to set errno for any inputs; -fno-rounding-math means the 
function can assume it is called with round-to-nearest mode).

(d) -ffinite-math-only (these entry points already exist).

(e) -fno-math-errno (in most cases, these could in fact alias the 
-ffinite-math-only entry points as all those do at present is bypass 
errno-setting wrappers).

(f) -fno-rounding-math.

(g) -fno-math-errno -fno-rounding-math.

(h) -ffinite-math-only -fno-math-errno -fno-rounding-math.

One might reasonably exclude the (e), (f) and (g) entry points to reduce 
the interface further, to five entry points per function (and many in 
practice wouldn't have the correctly-rounded entry point, at least for 
long double).

To keep libm code size down, I'd expect all of (c) to (h) to be 
implemented as alternative entry points (maybe aliases in some cases) to a 
common core implementation rather than duplicating code for them.  At 
present this might mean using some form of wrapper functions, as are 
presently used for errno setting, but one might imagine adding a GCC 
extension to support other forms of multiple entry points without such 
separate functions.

To keep the dynamic symbol table size down, entry points other than for 
(a) and (c) should only be provided where actually needed; all other cases 
require header magic to map a function to an appropriate linkage name, and 
such magic should use a minimal number of linkage names; once such a name 
is added, we're then stuck with supporting it even if a future 
implementation can use an alias.  For example, on platforms where long 
double has the same representation as double, glibc needs to provide 
symbols such as "sinl" (ISO C allows declaring such functions yourself 
rather than using <math.h>), although it's an alias to "sin".  But there 
is no need to provide __sinl_finite, and the ABI size is kept down by 
mapping sinl calls with -ffinite-math-only directly to __sin_finite rather 
than putting __sinl_finite in the ABI.  This applies to all cases where 
different functions can be aliases: don't add an entry point to the ABI 
unless and until it's needed, whether because it is actually providing 
something for which an alias would not suffice, or because it's required 
by ISO C or an ISO C extension.

Note that GCC does not predefine macros for -fno-math-errno or 
-fno-rounding-math.  If you wanted to define entry points to glibc for 
those options, you might want to make GCC predefine such macros first.  
The same applies for -fno-trapping-math, where I haven't proposed entry 
points in the list above but they might make sense in a limited number of 
cases where glibc needs to go to special lengths to *avoid* generating 
spurious "inexact" exceptions (so -fno-trapping-math could use entry 
points that don't worry about spurious exceptions).


9. It would be reasonable to implement the TS 18661-3 functions for 
particular IEEE 754-2008 floating-point types.  The main thing needed 
there would be GCC support for _Float32 etc.; then it would "just" be a 
matter of adding aliases and header declarations (without making glibc's 
ABI depend on the compiler support).  The points made under 8. above apply 
to determining what does or does not need an alias in the ABI.

Such an addition could eventually obsolete GCC's libquadmath by providing 
the _Float128 functions directly in glibc's libm under the names proposed 
for TS 18661-3.  It would however be necessary to have the ldbl-128 
functions implemented in such a way that they can be used both for long 
double (_Float128 functions being aliases) and for _Float128 when that 
type is not long double.  The fact that functions for binary128 might need 
to call the binary128 type by different C names (and use different 
suffixes on constants etc.) depending on the architecture is something to 
consider in making metalibm generate such functions.  (Right now there is 
an error-prone manual editing process involved in adapting the ldbl-128 
functions for libquadmath.)


10. None of the above considers entry points whose ABI, other than the 
symbol name, is different from that for standard libm functions.  The main 
case for that is probably entry points taking and returning vectors.  The 
general issue of keeping down the ABI size applies to any such entry 
points, as does the issue of avoiding the libm ABI depending on what 
vector ISA extensions are supported by the processor for which glibc is 
built.  Al Grant has suggested the consideration of vector types for C as 
part of a discussion of parallel language extensions (discussion starting 
at <http://www.open-std.org/pipermail/cplex/2013-June/000010.html>); it's 
not clear what if anything might eventually come of that.

-- 
Joseph S. Myers
joseph@codesourcery.com
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]