This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [RFC] Avoiding _divsi3 call during ld.so bootstrap
- From: John Reiser <jreiser at BitWagon dot com>
- To: libc-ports at sourceware dot org
- Date: Fri, 06 Apr 2007 13:10:17 -0700
- Subject: Re: [RFC] Avoiding _divsi3 call during ld.so bootstrap
- Organization: -
> From: Bob Wilson <bwilson@tensilica.com>
> Date: Fri, 06 Apr 2007 12:24:38 -0700
> To: Richard Henderson <rth@twiddle.net>
> CC: libc-alpha@sourceware.org, Chris Zankel <czankel@tensilica.com>
>
> Richard Henderson wrote:
>
>> On Wed, Apr 04, 2007 at 12:18:22PM -0700, Bob Wilson wrote:
>>
>>> The following line in elf_dynamic_do_rel in elf/do-rel.h is
>>> generating a call to _divsi3:
>>>
>>> r = r + MIN (nrelative, relsize / sizeof (ElfW(Rel)));
>>
>>
>> Do you not have a umulsi3_highpart insn? The compiler should
>> be able to convert this away from a real divide operation...
>
>
> It depends. There is a processor configuration option to support that,
> but not all Xtensa processors will have it. Thanks for the suggestion anyway.
>
> In the unlikely event that anyone has further comments, please follow-up on
> libc-ports, so we don't annoy Ulrich with any more of this "embedded crap".
Implement a subroutine _divsi3 which checks the divisor for being 12
(which is sizeof(RELA)), then takes advantage of 1/12 = 0.010101... in binary
to perform the division by 12 using the strategy of umulsi3_highpart
(multiply by a binary fraction, take the high part) as Richard suggests.
Hint: look at the code which gcc generates for x86 at -O2:
-----
int foo(unsigned a)
{
return a / 12;
}
-----
movl $0xAAAAAAAB, %eax
mull 8(%ebp)
shrl $3, %edx
movl %edx, %eax
-----
Multiply (u32xu32==>u64) by 0xaaaaaaab, take the high part (suitably shifted.)
If you have a double-width shift, then you can perform the multiplication
in logarithmic steps because of the pattern of the binary expansion of 1/12:
uint64 a,b,c,d,e,f;
a = relsize<<1; /* a = 0x00000002 * relsize; */
b = a + (a<< 2); /* b = 0x0000000A * relsize; */
c = b + (b<< 4); /* c = 0x000000AA * relsize; */
d = c + (c<< 8); /* d = 0x0000AAAA * relsize; */
e = d + (d<<16); /* e = 0xAAAAAAAA * relsize; */
f = e + relsize; /* f = 0xAAAAAAAB * relsize; */
q = f>>(32 + 3); /* relsize / 12 */
If you do not have a double-width shift, then you must do it
two bits at a time for 16 iterations, using 64-bit addition.
--