This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Proposal to handle __strstr_sse42 and friends issue on x86


On 14 Dec 2013, OndÅej BÃlka said:

> On Wed, Dec 11, 2013 at 10:55:01AM +1000, Allan McRae wrote:
>> would likely remove any advantage of the sse42 routine (not tested...),
>> and there are proposals to remove the sse42 routines for both x86 and
>> x86_64 due to quadratic complexity anyway [3,4].

Please. Half the GNU tools end up replacing strstr() with gnulib's
replacement strstr anyuway because of this. And they're right to do so.

> sse42 routines are quite ineffective in that regard, with plain sse2 you
> can get around five times faster. I planned to add a version that avoids
> unaligned loads for older processors.

I'd say it's not worth bothering with any of this unless it implements
the same algorithm as the C strstr(), rather than implementing something
with quadratic slowdown in really fast assembler. It doesn't matter if
strstr() is an imperceptible little bit faster on tiny needle / haystack
combinations if it slows down quadratically on the big ones where its
performance hit is in any case most noticeable. (Do we even know the
distribution of needle / haystack sizes on real systems? A preloaded
wrapper could tell us...)

> You can also use this one you just improve performance 15 times instead
> 30 if you expanded unaligned loads into aligned ones.

A 15-fold improvement is peanuts compared to the speedups you get from a
better algorithm -- and the generic code has a better algorithm than the
SSE4.2 code.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]