This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Status of strcmp.
- From: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 15 Aug 2013 14:01:34 +0400
- Subject: Re: Status of strcmp.
- References: <20130807140911 dot GA31968 at domone dot kolej dot mff dot cuni dot cz> <CAHjhQ926EE-MYDJR5Eftf+DUefBg-Gox0pw57vZ7XUwsO3OPJg at mail dot gmail dot com> <20130808190716 dot GA4589 at domone dot kolej dot mff dot cuni dot cz> <CAHjhQ92+C6uXyrUhTd3OWuoa6v2SeUaKLBuqaNX5Sqtn4ANBdg at mail dot gmail dot com> <CAHjhQ90S-1uBhwV44KODTcQkr=0U-P+_9Pu0O=RbYYY9e82JCA at mail dot gmail dot com> <20130809164420 dot GB4972 at domone dot kolej dot mff dot cuni dot cz> <CAHjhQ91rFwppQ4ixhPNuB9xe8FH9OrEoz3=eFrTQTscwOvSBCA at mail dot gmail dot com> <20130814215111 dot GB6769 at domone dot kolej dot mff dot cuni dot cz>
> Basically for each two implementations I could find a distribution which
> says that A is regression but also distribution that says B is
> regression.
Yes, now it is a hard job of tuning string functions. We need good
benchmarks and good metrics to determine the most important cases.
When we contribute optimized implementations 2-3 years ago, we
competed with C versions or not SSE assembler versions and got 2-5
times boost.
No we are working in scale of 10-35% because all string function now
are somehow already optimized.
If we consider your benchmarks as a "merge criteria" benchmarks, can
you please make documentation link work where everyone can find out
what gcc test, rand_L1, ... means and how you calculate the average
boost. It would be perfect to contribute the benchmarks to Glibc,
because current benchmarks are not reliable for the current scale. But
if it is a too big effort, we should have at least a clear
documentation to rely on your measurements.
My concern was only about atom results, where unaligned loads are not
that fast. The simple way is to leave atom's version as is and to
proceed with the patch boosting other architectures where the new
version is faster for all the tests.
--
Liubov Dmitrieva
On Thu, Aug 15, 2013 at 1:51 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Wed, Aug 14, 2013 at 11:46:23AM +0400, Liubov Dmitrieva wrote:
> Sorry but there was error that I did not noticed. When I generated a
> implementation a gcc optimized away my check if I cross page. Without
> that a performance was much better on big sizes.
>
> I uploaded fixed version that is better when there are unaligned load
> but not so if it competes to ssse3 one.
>
> It did not change a critical part, when we strings differ in first 16
> characters. In practice most of time is spend there and new
> implementation considerably improves this case. On gcc benchmark a
> performance ratios were left almost unchanged by slowing loop down.
>
> A reason why these sizes are important is empirical, when I did
> measurements around 90% of calls were of this type. You could also
> consider possible usecases, sorting and searching- strings will likely
> differ in first character; checking againist fixed word - there are only
> few words with 16 characters and more. Also when I cross checked this
> with how large strings passed to strlen are they are most of time less
> than 80 characters large which also supports importance of header.
>
>
> It is better for sizes upto 64 bytes which means that a header does good
> job.
>
> Also this touches of limitation of my benchmark which is selection of
> distribution. These checks tend to be corner cases and from them alone
> it is hard to say if there are more introduced regression than
> regressions fixed.
>
> Basically for each two implementations I could find a distribution which
> says that A is regression but also distribution that says B is
> regression.
>
> A best course could be use 16-byte ssse3 loop or something else, I do
> not know yet.
>