This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [COMMITTED] PowerPC: Remove 64 bits instructions in PPC32 code
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 26 May 2014 16:04:18 -0500
- Subject: Re: [COMMITTED] PowerPC: Remove 64 bits instructions in PPC32 code
- Authentication-results: sourceware.org; auth=none
- References: <53834CE6 dot 2080802 at linux dot vnet dot ibm dot com> <20140526181202 dot GB26150 at gate dot crashing dot org> <53839742 dot 5040906 at linux dot vnet dot ibm dot com>
> >> This patch replaces the insrdi by insrwi in powerpc32 assembly. Although they
> >> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
> >> do not thrown an illegal exception when running these instructions, valgrind
> >> fails accusing an invalid one.
> > This code is CPU-specific; as you say, those CPUs can use rldimi just
> > fine. The reason the code uses rldimi instead of rlwimi is because
> > it is faster (at least on power4, power5). Fix valgrind instead?
> >
> >
> > Segher
> >
> Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get:
>
> > ./test
> rldimi: min: 7 | max: 9
> rlwimi: min: 7 | max: 10
>
> And by issuing 16 instruction per test function I get:
>
> > ./test
> rldimi: min: 7 | max: 9
> rlwimi: min: 7 | max: 13
>
> Newer processor (POWER7) also shows the same behavior.
On a POWER7 I get that rlwimi is almost twice as slow as rldimi,
just as expected. The way you constructed your test with a blr
immediately after a single rl*imi you get only one per group no
matter what.
> And the instructions
> and not in hot path in the code (it is only called once), so I hardly consider
> this a performance regression.
That might well be. But see http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
where (part of) this code was added.
> Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits
> assembly code to avoid such issues with external tools (valgrind is only an
> example) and to allow possible future implementation in different chips that
> do not implement the 64-bits instructions to use powerN code.
I find this not a convincing argument at all. But it's not my call ;-)
Segher