This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [ARM] architecture specific subdirectories, optimised memchr and some questions

From: "Joseph S. Myers" <joseph at codesourcery dot com>
To: "Dr. David Alan Gilbert" <david dot gilbert at linaro dot org>
Cc: libc-ports at sourceware dot org, ports at linaro dot org
Date: Tue, 2 Aug 2011 15:06:32 +0000 (UTC)
Subject: Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
References: <20110715181101.GA20980@davesworkthinkpad>

On Fri, 15 Jul 2011, Dr. David Alan Gilbert wrote:

>   * Is the preconfigure the right place to check for the current architecture

Yes.

>   * and is it right to set $submachine there?

Well, setting base_machine and machine is what's meant to be done 
automatically, with submachine being determined by --with-cpu and causing 
a -mcpu= or -march= option to be passed.

>   * Why did the preconfigure previously append /arm to the end of $machine?

It puts it at the beginning, not the end, and this fits in with using 
target triplets (the idea being you might configure glibc for 
armv5-linux-gnueabi, for example, and get arm/eabi/armv5).  Though the 
list of ARM versions in config.sub (in config.git upstream) is rather old, 
one reason among others this use of target triplets isn't ideal and 
logically I think your approach of testing the compiler is better.

>   * Ideally I don't think the architecture specifics should be in the eabi
>     subdir; they should be at the top (they aren't eabi specific) - but I
>     can't see a sensible way to rework the search order to do that
>     - suggestions?

The simple approach is for each arm/eabi/$submachine directory to have an 
Implies file pointing to the directory outside eabi/.

Otherwise, while ports has historically been used as a dumping ground for 
random code removed from libc, I don't think that's the right approach; we 
have version control to preserve old versions of code and ports should 
have those ports of glibc that aren't in libc but are reasonably close to 
a working state, not code that's been broken or obsolete for a long time.  
By now I think it would be reasonable to remove all the old-ABI ARM code 
from ports, so moving the eabi code up a directory level and eliminating 
the complexities of claiming to support two different ABIs.  (The only 
ports that I think are in some semblance of a maintained state are alpha, 
arm/eabi, hppa, m68k, mips, powerpc - all of them for Linux only at 
present.)

>   * Does the memchr boiler plate look OK? (It seems to work!)  The code is
>     thumb-2 only which is a little unusual, but the 6T2 and 7-a that it
>     supports can both do that.

Has it been tested (with the glibc testsuite) for both big and little 
endian?

The code certainly needs CFI directives for when it adjusts the stack 
frame and saves/restores call-preserved registers, so that the debugger 
can backtrace properly when stopping things anywhere in this code.  See 
sysdeps/arm/memcpy.S for example.

>   * Given this directory structure - where would I put some code that
>     was Neon specific? It's a feature that's available in 7-a varients
>     (and later?)   arch/arm/eabi/armv7-a/neon?

That's a plausible location for such code selected at configure time - but 
also consider the use of STT_GNU_IFUNC when multi-arch is enabled.

Things get more complicated when you consider features not even available 
for all processors with NEON - fused multiply-add in VFPv4, for example.  
(The GCC side of that - built-in function support for fma - hasn't been 
done either, although other targets have it in 4.6.)  Here are my notes on 
how ideally the fma functions should be implemented for ARM (largely 
independent of your changes except that they may provide the framework for 
e.g. VFPv4-specific code in glibc):

1. Suppose glibc is being built for a VFPv4 multilib.  (There is no 
predefined preprocessor macro to say that VFPv4 is in use.  There should 
be one.)  Then it should use the VFPv4 fused instructions, whether through 
.S files, inline assembly or GCC built-in functions (once added).  The 
existing libc code is correct but not optimal.

2. Suppose glibc is being built for a VFP multilib, not v4.  The existing 
libc code is correct, but if at runtime the processor is v4 then it can do 
better via IFUNC.  This is essentially what x86 and x86_64 do in IFUNC 
configurations.

3. Suppose glibc is being built for a non-VFP multilib.  Then it is 
optimal to use the first of (VFPv4 implementation, plain VFP, soft-fp[*]) 
will work on the processor used at runtime.  (Even if the __aeabi_* helper 
functions are made to use IFUNC in future so they use VFP operations on 
VFP processors - which may well be desirable, though it has some tricky 
aspects - and even if they also get optional support for exceptions and 
rounding modes in the soft-float case, a straight soft-fp implementation 
of FMA is still going to be faster than the generic one layered on other 
soft-fp operations.)  So in the absence of IFUNC, or in the presence of 
IFUNC but when glibc is being built with options incompatible with 
enabling VFP (such as iWMMXt), a soft-fp version should be used.  In the 
presence of IFUNC, it could be used to select between a VFPv4 version, the 
generic version built with -mfpu=vfp -mfloat-abi=softfp, and the soft-fp 
version - though simply using the soft-fp version always in the 
non-VFP-multilib case may also be reasonable.

[*] This soft-fp version of fma doesn't actually exist.  Steve Munroe had 
a version in PR 3268, but it's not the right way of implementing fma using 
soft-fp.  Doing it properly means splitting up the multiplication macros 
to expose widening multiply, and implementing _FP_FMA.  The result could 
then be used in GCC as well (to replace fmsub in 
config/rs6000/darwin-ldouble.c which is currently used in implementing IBM 
long double for soft-float Power GNU/Linux).  I expect this could end up 
being a few days' work to get it working properly everywhere (including 
testing with Jakub's random test generator at 
<http://sourceware.org/ml/libc-hacker/2010-10/msg00005.html>).  It's 
relevant for getting fma working properly on any target without runtime 
support for exceptions and rounding modes, including but not limited to 
older ARM processors.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
  - From: David Gilbert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]