This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] benchtests/Makefile: Run the string benchmarks four times by default.


On 5 September 2013 17:04, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Sep 05, 2013 at 04:18:18PM +0100, Will Newton wrote:
>> On 5 September 2013 16:03, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> > On Thu, Sep 05, 2013 at 08:51:53AM +0100, Will Newton wrote:
>> >> The intention of my patch - which I may have not made completely clear
>> >> in the commit message - is to improve test stability. What I mean by
>> >> this is that with a physically indexed cache the physical pages
>> >> allocated to the test can have a significant effect on the performance
>> >> at large (e.g. cache size / ways and above) buffer sizes and this will
>> >> cause variation when running the same test multiple times. My aim is
>> >> to average out these differences as it is hard to control for them
>> >> without understanding the details of the cache subsystem of the system
>> >> you are running on.
>> >>
>> > This can be just explained just by having more data. Simply multiplying
>> > iteration count by four would then do same job.
>>
>> No, it wouldn't. That would just mean four times as much data
>> resulting in a reduced variance but the same systematic error.
>>
> That is you claim. I am now asking you second time to prove it.
>
> As I write in previous mail in same place:
>
> Please run your patch ten times and calculate variance. Compare that to
> variance when iteration count is increased 4 times and show if there is
> improvement.

The benchmarks do not currently have any measure of variance so it's
not possible to do this with the benchmarks as they stand. I have seen
this effect with other benchmarks however.

>> >> Your test appears to be addressing concerns of test validity by
>> >> running a wider range of buffer alignments, which is an important but
>> >> separate concern IMO.
>> >>
>> > No, your patch will pick src pointer at 4 different physical pages (one
>> > allocated in each run) and calculate average performance.
>> >
>> > Mine will pick src pointers in 2000000/4096 = 488 different pages and
>> > calculate average.
>>
>> Yes, this would work too. But it has a number of flaws:
>>
>> 1. It does not allow one to analyze the performance of the code across
>> alignments, everything gets averaged together.
>
> You cannot analyse performance across alignments now as benchmarks do
> not print necessary data.

It currently prints the alignments of the buffers, that is all that is
required. The alignments chosen are a rather poor selection though I
would agree.

>
>> 2. It has no mechanism for showing variance, whereas multiple runs of
>> the same test the variance of the means can at least be seen.
>
> There is a pretty good merchanism of showing variance and it is called
> calculating variance. However adding variance calculation is separate
> issue.

I think you misunderstand me. The benchmarks as they stand do not
output any measure of variance. Multiple runs is a quick and easy way
to get a measure of variance without modifying the benchmarks or their
output.

>> 3. It only works for one test (memcpy).
>>
> It is first step. A randomization is needed for all string functions and
> it is better to start on concrete example.

I agree completely, lets start by finding the best way to fix the
benchmarks, but once we have consensus i think it would be best to fix
all the benchmarks rather than leave some unfixed.

-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]