This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Support separate benchmark outputs


On Tue, Apr 16, 2013 at 05:55:44PM +0530, Siddhesh Poyarekar wrote:
> Hi,
> 
> Currently the benchmark supports simple computation of mean time,
> throughput, etc of functions over a varied set of inputs.  The string
> benchmarks measure multiple versions of the same function and do a
> comparison for different alignments and sizes and this does not fit
> into the generic output format.  It would be undesirable to try and
> fit them into the generic format either since their current output
> format is useful and may not be easily reproducible with the benchmark
> format.
> 
I already wrote systemwide profiler for string functions. It integrates
results so you do not have to. 
I also included unit test there. See kam/WWW/memcpy_profile.tar.bz2

I plan to integrate this to dryrun framework. 

> Hence the concept of benchmark sets, i.e. a set of benchmark
> measurements in a single program that have their own output.
> Essentially the only differentiator for this idea is that it prints
> its output into a separate file of its own and not in bench.out.  I've
> added a comment in benchtests/Makefile to explain how one can add a
> benchset.
> 
> In addition to support for benchsets, this patch also adds memcpy and
> memcpy-ifunc as proof of concept.  These are just test-memcpy and
> test-memcpy-ifunc copied over for now.  Once this patch is in, I will
> post patches to similarly copy over the rest of the string performance
> test functions.  After that, I will remove the performance measurement
> bits from the string/test-* and the correctness tests from
> benchtest/bench-*.
>
A memcpy is about most difficult function to benchmark. Most of
parameters like cache layout can only be observed in wild.
Lot of runtime is free becuse processor does it in background. 
I am still not sure how capture finer effects like store-load forwarding.


> +      for (i = 0; i < 32; ++i)
> +	{
> +	  HP_TIMING_NOW (start);
> +	  CALL (impl, dst, src, len);
> +	  HP_TIMING_NOW (stop);
> +	  HP_TIMING_BEST (best_time, start, stop);
> +	}
> +
You simply cannot do measurements in this way. They are biased and 
you will get result that is about 20 cycles off because you it did 
not take branch misprediction and thousand other factors.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]