This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize strstr, strcasestr and memmem


On Fri, May 18, 2012 at 7:15 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>>> The biggest problem I have with reviewing *any* of this code is that
>>> performance is relative to benchmarks.
>>>
>>> As a community we have no baseline benchmark numbers.
>>
>> Precisely - the very reason that I have not tried to further optimize
>> anything in glibc beyond my twoway string, and the reason the SSE4
>> quadratic pessimization even got in to glibc in the first place, is
>> because we don't have a good benchmark.
>
> I've been working on benchmarking patches for a couple of GLIBC subsystems lately (strstr and malloc), and it turned out to be a much more difficult task than it should be. ?I'm thinking on design of a benchmarking harness for GLIBC testsuite to effectively address performance regressions and patch submissions.

This should not preclude other people from contributing machine
specific benchmarks.

> The testsuite already has good coverage of the GLIBC functions, so I'm thinking to wrapping that into a benchmarking harness to get performance measurements out of what we already have, not creating subsystem-specific benchmarks. ?The current idea I'm looking at is extending test-skeleton framework to gather performance numbers.

This sounds like a great idea.

Let me be clear though that I do not think this is a complete solution.

The testsuite is intended to test the APIs exported by GLIBC and nothing else.

We will need machine specific benchmarks and I expected to to be added
over time.

You are right though that benchmarking a run of the testsuite is a
good and practical first approach to this problem.

> The benchmark design goals that I have so far:
>
> 1. Benchmarks will run a fixed amount of time, say 30 seconds. ?This will be controlled by a timer in test-skeleton.c similar to what is used to handle tests that time-out. ?We cannot afford to significantly extend the time it takes GLIBC testsuite to run.

Why "fixed time?" Why not "fixed number of iterations?" It is already
difficult for maintainers to evaluate TIMEOUT_FACTOR or TIMEOUT in the
tests, extending this to the benchmarks is like throwing salt on a
wound. I would prefer to see a fixed number of test executions rather
than a fixed amount of time.

> 2. Test-skeleton.c will be executing do_test() function in a loop, and the number of full executions divided by the time the executions took will be benchmark's score.

Do you have a plan for the tests that don't run via test-skeleton.c? I
would love to see test-skeleton.c expanded and converted to support
all the tests.

> 3. There is no goal to have benchmark score comparable between different systems. ?Benchmark scores will be meaningful only within a single system/setup. ?The goal is to provide supporting arguments for patch submissions and regression tracking. ?For a second time in two years I'm looking at what in GLIBC has caused a 50% slowdown on a customer benchmark -- that's overall benchmark slowdown, which means something in GLIBC got 5+ times slower. ?Surely we can do better.

Agreed.

> 4. For tests that can handle dummy implementations of GLIBC functions (i.e., function always returning the same value or otherwise being very fast) there will be a "training" mode to calculate the cost of the testing code / loops / etc that is _not_ the functions being benchmarked. ?After running the benchmark in train mode for, say, 5 seconds the functions will be switched from dummy to the real ones and benchmarked. ?Such approach will allow to precisely estimate performance of just the GLIBC functions of interest, removing the overhead of the testing/benchmarking harness from the overall benchmark score.

Sounds good.

>>> * A list of baseline benchmarks we are going to use to evaluate
>>> performance related submissions.
>
> As I said above, I think creating tests that are only benchmarks will be too much effort for anyone to actually do it.

Yet you just have, you've made the entire testsuite into the first
baseline benchmark. :-)

I forsee a need for machine specific micro-benchmarks being
contributed. What's important to Power may be different than what's
important to ARM or x86. We can't paint all machines with the same
brush.

>>> * Collect data on the runs.
>
> ... as part of default testsuite run. ?My current preference is for a CSV-formated summary file with a single number per benchmark.

That sounds good.

>>> * Record everything in the wiki including how to run the benchmarks
>>> and the collection of growing results.
>
> Yes. ?I would prefer people with automated continuos build/test setups to push the CSV files into Google spreadsheets with a pretty graph showing geomeans over benchmarks over time. ?We then will embed those "live" graphs into the wiki.

Agreed (except for the part about Google Docs).

We need to collect data, process it, and make some graphs, and keep
doing that over and over.

We could easily add another git repo of just performance data?

That way the CI system checks out the performance data, appends to it
after a run, and pushes the performance data out.

Then you could have consumers pull the new data and update
visualizations based on that data?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]