This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Avoiding _divsi3 call during ld.so bootstrap

From: John Reiser <jreiser at BitWagon dot com>
To: libc-ports at sourceware dot org
Date: Fri, 06 Apr 2007 13:10:17 -0700
Subject: Re: [RFC] Avoiding _divsi3 call during ld.so bootstrap
Organization: -

> From: Bob Wilson <bwilson@tensilica.com>
> Date: Fri, 06 Apr 2007 12:24:38 -0700
> To: Richard Henderson <rth@twiddle.net>
> CC: libc-alpha@sourceware.org, Chris Zankel <czankel@tensilica.com>
> 
> Richard Henderson wrote:
> 
>> On Wed, Apr 04, 2007 at 12:18:22PM -0700, Bob Wilson wrote:
>>
>>> The following line in elf_dynamic_do_rel in elf/do-rel.h is
>>> generating a call to _divsi3:
>>>
>>>       r = r + MIN (nrelative, relsize / sizeof (ElfW(Rel)));
>>
>>
>> Do you not have a umulsi3_highpart insn?  The compiler should
>> be able to convert this away from a real divide operation...
> 
> 
> It depends.  There is a processor configuration option to support that,
> but not all Xtensa processors will have it.  Thanks for the suggestion anyway.
> 
> In the unlikely event that anyone has further comments, please follow-up on
> libc-ports, so we don't annoy Ulrich with any more of this "embedded crap".

Implement a subroutine _divsi3 which checks the divisor for being 12
(which is sizeof(RELA)), then takes advantage of 1/12 = 0.010101... in binary
to perform the division by 12 using the strategy of umulsi3_highpart
(multiply by a binary fraction, take the high part) as Richard suggests.

Hint: look at the code which gcc generates for x86 at -O2:
-----
int foo(unsigned a)
{
        return a / 12;
}
-----
        movl    $0xAAAAAAAB, %eax
        mull    8(%ebp)
        shrl    $3, %edx
        movl    %edx, %eax
-----
Multiply (u32xu32==>u64) by 0xaaaaaaab, take the high part (suitably shifted.)

If you have a double-width shift, then you can perform the multiplication
in logarithmic steps because of the pattern of the binary expansion of 1/12:
	uint64 a,b,c,d,e,f;
	a = relsize<<1;   /* a = 0x00000002 * relsize; */
	b = a + (a<< 2);  /* b = 0x0000000A * relsize; */
	c = b + (b<< 4);  /* c = 0x000000AA * relsize; */
	d = c + (c<< 8);  /* d = 0x0000AAAA * relsize; */
	e = d + (d<<16);  /* e = 0xAAAAAAAA * relsize; */
	f = e + relsize;  /* f = 0xAAAAAAAB * relsize; */
	q = f>>(32 + 3);  /* relsize / 12 */

If you do not have a double-width shift, then you must do it
two bits at a time for 16 iterations, using 64-bit addition.

--

Follow-Ups:
- Re: [RFC] Avoiding _divsi3 call during ld.so bootstrap
  - From: Bob Wilson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]