This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
libm function variants in glibc
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: <libc-alpha at sourceware dot org>
- Cc: <olga dot kupriianova at lip6 dot fr>, <christoph dot lauter at ens-lyon dot org>
- Date: Tue, 23 Jul 2013 16:45:21 +0000
- Subject: libm function variants in glibc
The following are some comments on variants of libm functions in glibc,
following reading the metalibm slides now the Cauldron slides have been
posted to the GCC wiki. I'm talking about libm in general, although some
areas such as <complex.h> functions don't currently seem to be considered
by metalibm.
1. Discussions relating to what goes in glibc libm should of course take
place on the libc-alpha mailing list; discussions elsewhere, as referred
to on slide 24, are not particularly relevant to establishing community
consensus.
2. See <http://sourceware.org/ml/libc-alpha/2013-05/msg00132.html> for a
detailed description of what I think the accuracy goals are for glibc libm
functions at present.
3. The principle of generating function implementations, for the cases
using polynomial approximations, automatically using metalibm, certainly
makes sense and seems better than the existing cases where tables of
polynomial coefficients (or other precomputed values) were probably
computed with a range of software, not all free software, and no
documented way to recompute or verify those values exists, and there is
no good way to determine error bounds either.
4. A large part of libm doesn't involve polynomial approximation or other
precomputed values at all. Such tools as metalibm would seem to help much
less for implementing all the new functions from TS 18661-1 (bindings to
IEEE 754-2008), if there were interest in adding those to glibc, for
example; the only ones for which polynomial approximation could be
relevant at all are the formatOf sqrt variants (fsqrt, fsqrtl, dsqrtl -
compute a square root as if to infinite precision and round once to a
narrower type without double rounding).
(This does not mean such tools are of no use in such cases - the ability
to produce C code with a correctness proof would certainly be of value in
cases such as fma - look at the number of bugs I found in the various fma
implementations when reviewing/fixing them for correctness in all cases
including non-default rounding modes, even though the basic algorithm,
ignoring overflow and underflow cases, was proved correct by Boldo and
Melquiond. But there you have the further issue that it's not quite just
standard C being used - it's C with additional macros to ensure things get
evaluated on the correct side of rounding-mode changes or exception tests,
given the limitations on optimizations in GCC not respecting those things
at present. That issue of interactions of GCC optimizations with
rounding-mode changes and exception tests could also apply to other code
generated by metalibm, I suppose.)
These functions - functions that are in general fully specified for ISO C
(+ Annex F), and for which self-contained theoretical arguments for
correctness can be given that do not rely on exhaustive searches for worst
cases for correct rounding - are also the most likely to have
architecture-specific variants in glibc's sysdeps directories, because
processors are quite likely to have instructions for some or all of them.
But there are plenty of architecture-specific variants for other functions
as well - quite a few for x86/x86_64, and almost the entire libm for ia64.
Any comparison of new function implementations with old (whether for speed
or for accuracy) needs to take the architecture-specific variants into
account; in some cases it may be appropriate for an architecture to stop
using an architecture-specific variant, in other cases not.
5. To the extent that *new* functions are referred to at the user API
level - functions not currently provided by libm at all - I think we
should be guided by ISO C API extensions from WG14; functions not in such
API extensions are probably better added elsewhere (e.g. GSL) rather than
speculatively provided in glibc.
I think it *would* be reasonable to add any of the functions from ISO
24747 ("Extensions to the C Library, to Support Mathematical Special
Functions"), although those are likely to be particularly hard to
implement (involving many numerically sensitive cases).
It would also be reasonable to add any of the functions listed in IEEE
754-2008 that aren't currently in the standard C library, *once a draft of
TS 18661-4 is available to indicate what the C APIs for such functions,
and special cases, should look like*. (This refers to functions such as
sinpi. In the case of exp10, we already have it as an extension to ISO C.
Most of these functions are straightforward to implement if you don't mind
errors of a few ulp.)
It would also be reasonable to add any of the functions C11 reserves for
addition to <complex.h> (we already have clog10), with the caveat that
it's not immediately obvious what branch cuts should look like for
clgamma. The difficulty of implementing such functions varies.
For any new functions, it would definitely help to have them implemented
first in MPFR/MPC for convenience in generating testcases, although I
don't think we can require that.
6. An important axis of variation in libm function implementations -
though not in the interface provided to users - that doesn't seem to be
mentioned in the metalibm slides (and I couldn't find anything about it in
the metalibm implementation either) is the choice of the type used for
arithmetic within the implementation, as distinct from the type used for
arguments and return values. Thus, when implementing a float function
(for example - similar issues apply to double functions as well), say sinf
(and presuming float is IEEE binary32, as it always is in glibc), there
are several choices, and metalibm would need to allow for this issue when
being used to implement functions for glibc:
(a) Implement using float as the internal implementation type. This is
common in glibc, but maybe not such a good idea in reality. For most
architectures it's likely use of double internally is more efficient for a
given accuracy requirement than using float internally, as it may reduce
the need for precision-extension techniques (Dekker etc.). In certain
cases where float is supported in hardware but double isn't, it's
conceivable using float internally could be better.
(b) Implement using double as the internal implementation type (with
double+double if correct rounding is required *and* just one double isn't
accurate enough for the given inputs). This is likely to be appropriate
in most cases.
(c) Implement using long double as the internal implementation type, where
long double is the x86/ia64/m68k "extended" type (the m68k version being
slightly different from the others). This is indicated on 32-bit x86,
where you simply can't use float or double as internal type reliably
because of the workings of the x87 FPU. (You can set and restore the
precision control bits, and some glibc libm functions do so, but the
handling of values with out-of-range exponent will still depend on which
computed values GCC decides to spill to memory, so any proofs would need
to allow for that, and setting and restoring precision control has its own
performance cost.) It may be indicated on some m68k systems, for similar
reasons. I don't know whether it's appropriate on ia64 (that probably
depends on the performance characteristics of float / double / long double
arithmetic, since the hardware properly supports all of those). It's
probably not appropriate on x86_64, since the SSE floating-point unit is
supposed to be faster than the x87 one. And on 32-bit x86, it may be
appropriate to use STT_GNU_IFUNC so that functions can be implemented
using SSE2 on hardware supporting that and only using x87 for the "long
double" functions or for hardware without SSE2.
(d) Implement using long double as the internal implementation type, where
long double is binary128. Most platforms with binary128 implement it in
software, meaning this is not the most efficient approach. I think S/390
may implement it in hardware, however (at least, GCC has instruction
patterns for it; I don't know if the instructions are actually implemented
in hardware or through kernel emulation).
(e) Implement with software floating point (using soft-fp interfaces
directly). Although more efficient on systems without hardware
floating-point support than using precision-extension techniques layered
on top of software floating point, I don't expect anyone actually to
implement this, as users seriously concerned about efficiency of
floating-point functions are unlikely to be using such processors. So
this option is just listed for completeness.
7. A persistent source of issues in glibc's libm is powerpc with IBM long
double. I don't know if there's anything metalibm can do there; such a
non-IEEE type is generally problematic for lots of things that work well
with IEEE floating-point types. One might imagine it could be used to
generate polynomial coefficients, even if precise error bounds can't
usefully be proved when this type is involved.
8. 12000 function variants (80 per (operation, type) pair), as referred to
in the metalibm slides, would clearly be absurd for glibc. We need to
keep the size of the libm.so binary under control. Also, the interface
provided by libm.so under a particular symbol name needs to be
well-defined, not depending on how glibc is configured, and once such a
symbol is exported we need to support it forever.
Right now, we have *two* variants for most functions:
(i) The default version, with a name such as "sin" - see the message I
referred to under 2. above for the goals for such a function.
(ii) A version __<func>_finite, used when a user program is built with
-ffinite-math-only. For glibc libm, -ffinite-math-only is considered to
imply -fno-math-errno (all this means is no errno settings on underflow,
since -ffinite-math-only implies that no other inputs that could involve
errno setting are valid, and ISO C leaves it implementation-defined
whether errno is set on underflow even when (math_errhandling &
MATH_ERRNO) is nonzero).
The relevant existing GCC options are -ffast-math (an option implying
several others), -funsafe-math-optimizations (which itself implies a few
others), -ffinite-math-only, -fno-math-errno, -fno-signaling-nans
(default), -fno-rounding-math (default), -fcx-limited-range,
-fno-trapping-math, -fno-signed-zeros, -fassociative-math,
-freciprocal-math.
Trying to define libm functions variants for all combinations of these
would be excessive. I suggest that the *most* that makes sense is the
following eight variants, and this could reasonably be cut further.
(a) Correctly rounded. These should be named e.g. crsin following TS
18661-4, when provided at all. They should handle all rounding modes,
produce accurate exceptions including "inexact", and accurate errno.
(b) -ffast-math. This would imply no error or exception guarantees, only
finite arguments and results need be handled, and the result is within a
few ulp of the correct result for inputs within a few ulp of the input
value actually passed to the function (so there could be large errors in
numerically unstable regions).
(c) General-purpose, with goals as described before in 2.; these would get
the linkage names such as "sin", and would handle all rounding modes and
errno and exception (except "inexact") setting. If automatically
generated with error guarantees, say < 1ulp in round-to-nearest and < 2ulp
in other modes. The same goals would apply for all the following variants
with the implied limitations (-ffinite-math-only means no need to handle
non-finite inputs or outputs or to set errno on underflow; -fno-math-errno
means no need to set errno for any inputs; -fno-rounding-math means the
function can assume it is called with round-to-nearest mode).
(d) -ffinite-math-only (these entry points already exist).
(e) -fno-math-errno (in most cases, these could in fact alias the
-ffinite-math-only entry points as all those do at present is bypass
errno-setting wrappers).
(f) -fno-rounding-math.
(g) -fno-math-errno -fno-rounding-math.
(h) -ffinite-math-only -fno-math-errno -fno-rounding-math.
One might reasonably exclude the (e), (f) and (g) entry points to reduce
the interface further, to five entry points per function (and many in
practice wouldn't have the correctly-rounded entry point, at least for
long double).
To keep libm code size down, I'd expect all of (c) to (h) to be
implemented as alternative entry points (maybe aliases in some cases) to a
common core implementation rather than duplicating code for them. At
present this might mean using some form of wrapper functions, as are
presently used for errno setting, but one might imagine adding a GCC
extension to support other forms of multiple entry points without such
separate functions.
To keep the dynamic symbol table size down, entry points other than for
(a) and (c) should only be provided where actually needed; all other cases
require header magic to map a function to an appropriate linkage name, and
such magic should use a minimal number of linkage names; once such a name
is added, we're then stuck with supporting it even if a future
implementation can use an alias. For example, on platforms where long
double has the same representation as double, glibc needs to provide
symbols such as "sinl" (ISO C allows declaring such functions yourself
rather than using <math.h>), although it's an alias to "sin". But there
is no need to provide __sinl_finite, and the ABI size is kept down by
mapping sinl calls with -ffinite-math-only directly to __sin_finite rather
than putting __sinl_finite in the ABI. This applies to all cases where
different functions can be aliases: don't add an entry point to the ABI
unless and until it's needed, whether because it is actually providing
something for which an alias would not suffice, or because it's required
by ISO C or an ISO C extension.
Note that GCC does not predefine macros for -fno-math-errno or
-fno-rounding-math. If you wanted to define entry points to glibc for
those options, you might want to make GCC predefine such macros first.
The same applies for -fno-trapping-math, where I haven't proposed entry
points in the list above but they might make sense in a limited number of
cases where glibc needs to go to special lengths to *avoid* generating
spurious "inexact" exceptions (so -fno-trapping-math could use entry
points that don't worry about spurious exceptions).
9. It would be reasonable to implement the TS 18661-3 functions for
particular IEEE 754-2008 floating-point types. The main thing needed
there would be GCC support for _Float32 etc.; then it would "just" be a
matter of adding aliases and header declarations (without making glibc's
ABI depend on the compiler support). The points made under 8. above apply
to determining what does or does not need an alias in the ABI.
Such an addition could eventually obsolete GCC's libquadmath by providing
the _Float128 functions directly in glibc's libm under the names proposed
for TS 18661-3. It would however be necessary to have the ldbl-128
functions implemented in such a way that they can be used both for long
double (_Float128 functions being aliases) and for _Float128 when that
type is not long double. The fact that functions for binary128 might need
to call the binary128 type by different C names (and use different
suffixes on constants etc.) depending on the architecture is something to
consider in making metalibm generate such functions. (Right now there is
an error-prone manual editing process involved in adapting the ldbl-128
functions for libquadmath.)
10. None of the above considers entry points whose ABI, other than the
symbol name, is different from that for standard libm functions. The main
case for that is probably entry points taking and returning vectors. The
general issue of keeping down the ABI size applies to any such entry
points, as does the issue of avoiding the libm ABI depending on what
vector ISA extensions are supported by the processor for which glibc is
built. Al Grant has suggested the consideration of vector types for C as
part of a discussion of parallel language extensions (discussion starting
at <http://www.open-std.org/pipermail/cplex/2013-June/000010.html>); it's
not clear what if anything might eventually come of that.
--
Joseph S. Myers
joseph@codesourcery.com