This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] benchtests: Add malloc microbenchmark
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Rich Felker <dalias at aerifal dot cx>, Will Newton <will dot newton at linaro dot org>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Tue, 15 Apr 2014 14:09:12 -0500
- Subject: Re: [PATCH] benchtests: Add malloc microbenchmark
- Authentication-results: sourceware.org; auth=none
- References: <1397568941-4298-1-git-send-email-will dot newton at linaro dot org> <1397576171 dot 12247 dot 7 dot camel at spokane1 dot rchland dot ibm dot com> <CANu=Dmji4SC2C2U4ps9Ci1LQq7gZ1GMz7BZXZ6n+zygMH8g78g at mail dot gmail dot com> <20140415162746 dot GY26358 at brightrain dot aerifal dot cx>
- Reply-to: munroesj at us dot ibm dot com
On Tue, 2014-04-15 at 12:27 -0400, Rich Felker wrote:
> On Tue, Apr 15, 2014 at 04:42:25PM +0100, Will Newton wrote:
> > On 15 April 2014 16:36, Steven Munroe <munroesj@linux.vnet.ibm.com> wrote:
> > > On Tue, 2014-04-15 at 14:35 +0100, Will Newton wrote:
> > >> Add a microbenchmark for measuring malloc and free performance. The
> > >> benchmark allocates and frees buffers of random sizes in a random
> > >> order and measures the overall execution time and RSS. Variants of the
> > >> benchmark are run with 8, 32 and 64 threads to measure the effect of
> > >> concurrency on allocator performance.
> > >>
> > >> The random block sizes used follow an inverse square distribution
> > >> which is intended to mimic the behaviour of real applications which
> > >> tend to allocate many more small blocks than large ones.
> > >>
> > >
> > > This test is more likely to measure the locking overhead of random then
> > > it is to measure malloc performance.
> >
> > It uses rand_r so I don't think this is the case.
>
> If you're using rand_r, you need to be careful how you use the output,
> as glibc's rand_r implementation has very poor statistical properties.
> See:
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615
>
> snip
>
> > The benchmark code spends roughly 80% of its time within malloc/free
> > and friends, which is good, but does leave some room for improvement.
> > Around 10% of the time is spent in dealing with random number
> > generation so maybe a simple inline random number generator would
> > improve things.
>
I personally strive for 95-99% time in the software-under-test (SUT).
This is much harder then it looks but can and should be done.
The other issue to look out for is gettimeofday/clock_gettime overheads.
You need to run the SUT long enough that the clock reading and
conversion is not a factor in the measurement.
> What about just pregenerating a large array of random numbers and
> accessing sequentual slots of the array? This potentially has cache
> issues but it might be possible to simply use a small array and wrap
> back to the beginning, perhaps performing a trivial operation like
> adding the last output of the previous run onto the value in the
> array.
>
This is generally a better design for a micro-benchmark.