This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Proposal to handle __strstr_sse42 and friends issue on x86
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Allan McRae <allan at archlinux dot org>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Sat, 14 Dec 2013 20:18:19 +0100
- Subject: Re: Proposal to handle __strstr_sse42 and friends issue on x86
- Authentication-results: sourceware.org; auth=none
- References: <52A7B7E5 dot 6020607 at archlinux dot org>
On Wed, Dec 11, 2013 at 10:55:01AM +1000, Allan McRae wrote:
> Hi all,
>
> For those who need some background, see [1]. In short, there is an
> issue with __strstr_sse42 on x86 which has a variety of workarounds.
>
> Some distributions re-add the inline statement, which is clearly fragile
> and not a fix. Others remove the sse42 string functions - see [2].
>
> I am going to propose we adopt the removal of the SSE42 routines. We
> can not ensure that binaries are built with a new enough compiler (gcc
> after 2000) and keep backwards compatibility. Also, ensuring the stack
> is aligned when entering these functions would be a performance hit that
> would likely remove any advantage of the sse42 routine (not tested...),
> and there are proposals to remove the sse42 routines for both x86 and
> x86_64 due to quadratic complexity anyway [3,4].
>
> So applying the patch in [2] seems the best approach to me? Any
> comments/objections?
>
I send a patch that improves strstr performance. It got acked by Liubov
Dmitrieva and I asked if there are more comments and forgoten about it.
I applied that now,
sse42 routines are quite ineffective in that regard, with plain sse2 you
can get around five times faster. I planned to add a version that avoids
unaligned loads for older processors.
You can also use this one you just improve performance 15 times instead
30 if you expanded unaligned loads into aligned ones.