This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC] Clean up SSE variable shifts
- From: "Lu, Hongjiu" <hongjiu dot lu at intel dot com>
- To: Richard Henderson <rth at twiddle dot net>, "libc-alpha at sourceware dot org"<libc-alpha at sourceware dot org>
- Date: Sat, 21 Aug 2010 13:28:31 -0700
- Subject: RE: [RFC] Clean up SSE variable shifts
- References: <4C6EF1B7.7020508@twiddle.net>
Your email doesn't show up on libc-alpha.
Your patch has the wrong path for varshift.h.
>
> (1) Instead of having the compiler generate a jump table, use a
> computed branch inside inline assembly.
Have you compared the generated code against C version in varshift.h?
> It's tempting to actually share code here, and generate the table out-
> of-line with entries like
>
> psrldq $1, %xmm0
> ret
>
> and use call *%1 in the inline assembly. The use of
>
> register __m128i value __asm__("%xmm0");
>
> could be used to restrict the compiler to the single register
> supported by the out-of-line table. It doesn't look like this would
> unduly hamper the compiler in the places it's used.
>
> There are currently 5 copies of this jump table in libc.
> We'd save 4*8*16 = 512 bytes of code space with the out-of-line
> version.
What is the performance impact of extra function call
vs. multiple copies of the same jump table?
> (2) The two instances of jump tables involving palignr can me done
> just as easily by re-reading the data via an unaligned load. From a
> hot cache, that's surely faster than anything else we could do here.
>
Sure.
I have been wishing for variable vector shift instructions.
H.J.