This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: "Dr. David Alan Gilbert" <david dot gilbert at linaro dot org>
- Cc: libc-ports at sourceware dot org, ports at linaro dot org
- Date: Tue, 2 Aug 2011 15:06:32 +0000 (UTC)
- Subject: Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
- References: <20110715181101.GA20980@davesworkthinkpad>
On Fri, 15 Jul 2011, Dr. David Alan Gilbert wrote:
> * Is the preconfigure the right place to check for the current architecture
Yes.
> * and is it right to set $submachine there?
Well, setting base_machine and machine is what's meant to be done
automatically, with submachine being determined by --with-cpu and causing
a -mcpu= or -march= option to be passed.
> * Why did the preconfigure previously append /arm to the end of $machine?
It puts it at the beginning, not the end, and this fits in with using
target triplets (the idea being you might configure glibc for
armv5-linux-gnueabi, for example, and get arm/eabi/armv5). Though the
list of ARM versions in config.sub (in config.git upstream) is rather old,
one reason among others this use of target triplets isn't ideal and
logically I think your approach of testing the compiler is better.
> * Ideally I don't think the architecture specifics should be in the eabi
> subdir; they should be at the top (they aren't eabi specific) - but I
> can't see a sensible way to rework the search order to do that
> - suggestions?
The simple approach is for each arm/eabi/$submachine directory to have an
Implies file pointing to the directory outside eabi/.
Otherwise, while ports has historically been used as a dumping ground for
random code removed from libc, I don't think that's the right approach; we
have version control to preserve old versions of code and ports should
have those ports of glibc that aren't in libc but are reasonably close to
a working state, not code that's been broken or obsolete for a long time.
By now I think it would be reasonable to remove all the old-ABI ARM code
from ports, so moving the eabi code up a directory level and eliminating
the complexities of claiming to support two different ABIs. (The only
ports that I think are in some semblance of a maintained state are alpha,
arm/eabi, hppa, m68k, mips, powerpc - all of them for Linux only at
present.)
> * Does the memchr boiler plate look OK? (It seems to work!) The code is
> thumb-2 only which is a little unusual, but the 6T2 and 7-a that it
> supports can both do that.
Has it been tested (with the glibc testsuite) for both big and little
endian?
The code certainly needs CFI directives for when it adjusts the stack
frame and saves/restores call-preserved registers, so that the debugger
can backtrace properly when stopping things anywhere in this code. See
sysdeps/arm/memcpy.S for example.
> * Given this directory structure - where would I put some code that
> was Neon specific? It's a feature that's available in 7-a varients
> (and later?) arch/arm/eabi/armv7-a/neon?
That's a plausible location for such code selected at configure time - but
also consider the use of STT_GNU_IFUNC when multi-arch is enabled.
Things get more complicated when you consider features not even available
for all processors with NEON - fused multiply-add in VFPv4, for example.
(The GCC side of that - built-in function support for fma - hasn't been
done either, although other targets have it in 4.6.) Here are my notes on
how ideally the fma functions should be implemented for ARM (largely
independent of your changes except that they may provide the framework for
e.g. VFPv4-specific code in glibc):
1. Suppose glibc is being built for a VFPv4 multilib. (There is no
predefined preprocessor macro to say that VFPv4 is in use. There should
be one.) Then it should use the VFPv4 fused instructions, whether through
.S files, inline assembly or GCC built-in functions (once added). The
existing libc code is correct but not optimal.
2. Suppose glibc is being built for a VFP multilib, not v4. The existing
libc code is correct, but if at runtime the processor is v4 then it can do
better via IFUNC. This is essentially what x86 and x86_64 do in IFUNC
configurations.
3. Suppose glibc is being built for a non-VFP multilib. Then it is
optimal to use the first of (VFPv4 implementation, plain VFP, soft-fp[*])
will work on the processor used at runtime. (Even if the __aeabi_* helper
functions are made to use IFUNC in future so they use VFP operations on
VFP processors - which may well be desirable, though it has some tricky
aspects - and even if they also get optional support for exceptions and
rounding modes in the soft-float case, a straight soft-fp implementation
of FMA is still going to be faster than the generic one layered on other
soft-fp operations.) So in the absence of IFUNC, or in the presence of
IFUNC but when glibc is being built with options incompatible with
enabling VFP (such as iWMMXt), a soft-fp version should be used. In the
presence of IFUNC, it could be used to select between a VFPv4 version, the
generic version built with -mfpu=vfp -mfloat-abi=softfp, and the soft-fp
version - though simply using the soft-fp version always in the
non-VFP-multilib case may also be reasonable.
[*] This soft-fp version of fma doesn't actually exist. Steve Munroe had
a version in PR 3268, but it's not the right way of implementing fma using
soft-fp. Doing it properly means splitting up the multiplication macros
to expose widening multiply, and implementing _FP_FMA. The result could
then be used in GCC as well (to replace fmsub in
config/rs6000/darwin-ldouble.c which is currently used in implementing IBM
long double for soft-float Power GNU/Linux). I expect this could end up
being a few days' work to get it working properly everywhere (including
testing with Jakub's random test generator at
<http://sourceware.org/ml/libc-hacker/2010-10/msg00005.html>). It's
relevant for getting fma working properly on any target without runtime
support for exceptions and rounding modes, including but not limited to
older ARM processors.
--
Joseph S. Myers
joseph@codesourcery.com