This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Porting string performance tests into benchtests
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: David Miller <davem at davemloft dot net>
- Cc: siddhesh at redhat dot com, libc-alpha at sourceware dot org
- Date: Fri, 5 Apr 2013 16:00:34 +0200
- Subject: Re: [RFC] Porting string performance tests into benchtests
- References: <20130404033719 dot GA14860 at spoyarek dot pnq dot redhat dot com> <20130403 dot 234042 dot 1776194180184022553 dot davem at davemloft dot net> <20130404155521 dot GA18716 at domone dot kolej dot mff dot cuni dot cz> <20130404 dot 140224 dot 887954478230882152 dot davem at davemloft dot net>
On Thu, Apr 04, 2013 at 02:02:24PM -0400, David Miller wrote:
> From: OndÅej BÃlka <neleai@seznam.cz>
> Date: Thu, 4 Apr 2013 17:55:21 +0200
>
> > On Wed, Apr 03, 2013 at 11:40:42PM -0400, David Miller wrote:
> >> From: Siddhesh Poyarekar <siddhesh@redhat.com>
> >> Date: Thu, 4 Apr 2013 09:07:19 +0530
> >>
> >> > On Wed, Apr 03, 2013 at 12:35:22PM -0400, David Miller wrote:
> >> >>
> >> >> I strongly perfer the raw cpu cycle counter read.
> >> >
> >> > Could you elaborate on that? Is it just a personal preference or is
> >> > some aspect of my argument in favour of clock_gettime incorrect or
> >> > irrelevant?
> >>
> >> I really want to see on the cpu cycle level whether the changes I make
> >> to the pre-loop and post-loop code make any difference.
> >>
> > Which as for str* majority of time is spend on pre/loop code is most
> > important to measure.
>
> Not for very small strings, where the pre and post loop costs dominate.
>
This was what I tried to say. But had there pre/loop to pre-loop typo.
> >> And on sparc chips I don't have the issues that can make the cpu cycle
> >> counter inaccurate or less usable as a timing mechanism.
> >
> > Other benefit is that you can rapidly vary implementations. This mostly
> > eliminate biases caused by cpu frequency switching etc.
>
> My cpus don't switch frequency, that's what I'm trying to say.
Anyway I have benchmark which I developed mostly on x64. I try how it is
possible to extend it for other architectures. Could you try:
http://kam.mff.cuni.cz/~ondra/memset_profile_aligned.tar.bz2
run ./benchmarks script.
I as generic use CLOCK_REALTIME, is there way to get ticks? (logic infile utils.h)
Results are in result* directories and compressed in results.tar.bz2
file.
It has second purpose, in real workloads memset has almost always start
and end 8-byte aligned so I want to see if it helps header.