This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize strstr, strcasestr and memmem

From: Maxim Kuvyrkov <maxim at codesourcery dot com>
To: Carlos O'Donell <carlos at systemhalted dot org>
Cc: Eric Blake <eblake at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>,OndÅej BÃlka <neleai at seznam dot cz>, "Joseph S. Myers"<joseph at codesourcery dot com>
Date: Tue, 22 May 2012 06:18:44 +1200
Subject: Re: [PATCH] Optimize strstr, strcasestr and memmem
References: <2C516CF2-D083-4C1D-AD27-6A31D381D548@codesourcery.com> <CADZpyiyMWPGm+2RzqKE9_f-N=ZqgFs2goZJGV8w2wv2=F_-hnQ@mail.gmail.com> <4FB6BE5E.2010804@redhat.com> <6E6D0BBA-F808-43C3-9A61-3E68809E0A59@codesourcery.com> <CADZpyiwWbtaN=7QStNTaj1pcTQHVPuAABbWGUFsqJk1Fo=3J_g@mail.gmail.com>

On 21/05/2012, at 2:11 AM, Carlos O'Donell wrote:

> On Fri, May 18, 2012 at 7:15 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>>>> The biggest problem I have with reviewing *any* of this code is that
>>>> performance is relative to benchmarks.
>>>> 
>>>> As a community we have no baseline benchmark numbers.
>>> 
>>> Precisely - the very reason that I have not tried to further optimize
>>> anything in glibc beyond my twoway string, and the reason the SSE4
>>> quadratic pessimization even got in to glibc in the first place, is
>>> because we don't have a good benchmark.
>> 
>> I've been working on benchmarking patches for a couple of GLIBC subsystems lately (strstr and malloc), and it turned out to be a much more difficult task than it should be.  I'm thinking on design of a benchmarking harness for GLIBC testsuite to effectively address performance regressions and patch submissions.
> 
> This should not preclude other people from contributing machine
> specific benchmarks.

Certainly.  The design and thought process behind it that I outlined will address the most immediate need for creating at least /some/ performance baseline for GLIBC.

>> The benchmark design goals that I have so far:
>> 
>> 1. Benchmarks will run a fixed amount of time, say 30 seconds.  This will be controlled by a timer in test-skeleton.c similar to what is used to handle tests that time-out.  We cannot afford to significantly extend the time it takes GLIBC testsuite to run.
> 
> Why "fixed time?" Why not "fixed number of iterations?" It is already
> difficult for maintainers to evaluate TIMEOUT_FACTOR or TIMEOUT in the
> tests, extending this to the benchmarks is like throwing salt on a
> wound. I would prefer to see a fixed number of test executions rather
> than a fixed amount of time.

Why would we want to evaluate TIMEOUT_FACTOR and TIMEOUT for benchmarks beyond setting it to a single value for all tests?  The mechanism for controlling benchmark runtime that I have in mind is that the benchmark body is run in "while (run_benchmark) {}" loop with an alarm set to BENCHMARK_TIME seconds (30-120 seconds).  When the alarm goes off the signal handler sets "run_benchmark = false" and stops the execution of the benchmark, but allowing it to finish the current iteration.

For most benchmarks one can get reasonably-precise results from a 30-120 second run.  Runs below 10 seconds will have too much of startup/warmup error.  Runs above 120 seconds will be just wasting CPU time (assuming that the benchmark body can execute within 1-5 seconds, so that we get well-averaged results).

I'm tired of hand-tweaking iteration numbers for all the different systems out there.

> 
>> 2. Test-skeleton.c will be executing do_test() function in a loop, and the number of full executions divided by the time the executions took will be benchmark's score.
> 
> Do you have a plan for the tests that don't run via test-skeleton.c? I
> would love to see test-skeleton.c expanded and converted to support
> all the tests.

The plan is to not handle them until they are converted to use test-skeleton.c.

> 
>> 3. There is no goal to have benchmark score comparable between different systems.  Benchmark scores will be meaningful only within a single system/setup.  The goal is to provide supporting arguments for patch submissions and regression tracking.  For a second time in two years I'm looking at what in GLIBC has caused a 50% slowdown on a customer benchmark -- that's overall benchmark slowdown, which means something in GLIBC got 5+ times slower.  Surely we can do better.
> 
> Agreed.
> 
>> 4. For tests that can handle dummy implementations of GLIBC functions (i.e., function always returning the same value or otherwise being very fast) there will be a "training" mode to calculate the cost of the testing code / loops / etc that is _not_ the functions being benchmarked.  After running the benchmark in train mode for, say, 5 seconds the functions will be switched from dummy to the real ones and benchmarked.  Such approach will allow to precisely estimate performance of just the GLIBC functions of interest, removing the overhead of the testing/benchmarking harness from the overall benchmark score.
> 
> Sounds good.
> 
>>>> * A list of baseline benchmarks we are going to use to evaluate
>>>> performance related submissions.
>> 
>> As I said above, I think creating tests that are only benchmarks will be too much effort for anyone to actually do it.
> 
> Yet you just have, you've made the entire testsuite into the first
> baseline benchmark. :-)
> 
> I forsee a need for machine specific micro-benchmarks being
> contributed. What's important to Power may be different than what's
> important to ARM or x86. We can't paint all machines with the same
> brush.

Yes.

> 
>>>> * Collect data on the runs.
>> 
>> ... as part of default testsuite run.  My current preference is for a CSV-formated summary file with a single number per benchmark.
> 
> That sounds good.
> 
>>>> * Record everything in the wiki including how to run the benchmarks
>>>> and the collection of growing results.
>> 
>> Yes.  I would prefer people with automated continuos build/test setups to push the CSV files into Google spreadsheets with a pretty graph showing geomeans over benchmarks over time.  We then will embed those "live" graphs into the wiki.
> 
> Agreed (except for the part about Google Docs).
> 
> We need to collect data, process it, and make some graphs, and keep
> doing that over and over.
> 
> We could easily add another git repo of just performance data?
> 
> That way the CI system checks out the performance data, appends to it
> after a run, and pushes the performance data out.
> 
> Then you could have consumers pull the new data and update
> visualizations based on that data?

This is a great idea.  I would love to see a [semi-]public-writable git repo where people could upload their benchmark results with a simple "make upload-benchmark" rule.  Kind of next level up from testresult mailing lists that many projects have.  With a file-per-machine policy one can then easily generate plottable format from the git history of the file.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Follow-Ups:
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Carlos O'Donell

References:
- [PATCH] Optimize strstr, strcasestr and memmem
  - From: Maxim Kuvyrkov
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Carlos O'Donell
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Eric Blake
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Maxim Kuvyrkov
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]