This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PATCH: Optimize memcmp for ia32

From: Jakub Jelinek <jakub at redhat dot com>
To: "H. J. Lu" <hjl at lucon dot org>
Cc: GNU C Library <libc-alpha at sources dot redhat dot com>
Date: Tue, 10 Feb 2004 16:20:01 +0100
Subject: Re: PATCH: Optimize memcmp for ia32
References: <20040205001126.GA24827@lucon.org> <20040210144819.GA11273@sunsite.ms.mff.cuni.cz> <20040210171830.GA5977@lucon.org>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Tue, Feb 10, 2004 at 09:18:30AM -0800, H. J. Lu wrote:
> On Tue, Feb 10, 2004 at 03:48:19PM +0100, Jakub Jelinek wrote:
> > On Wed, Feb 04, 2004 at 04:11:26PM -0800, H. J. Lu wrote:
> > > This patch optimizes memcmp for ia32. I got average speeup by around
> > > 400%.
> > 
> > If not anything else, you should certainly handle PIC vs. !PIC differently
> > (for !PIC you don't need to call thunk etc.).
> 
> I can change it.
> 
> > Also, why do you need to use %ebx register when for example %eax is always
> > available?
> 
> I will take a look.
> 
> > Why do you need 4 separate L(Nbytes) sequences, the only difference between
> > them is in the last few instructions?  The bigger the routine is, the more
> > other instructions will be kicked out of the caches (especially for a
> > routine which is not the topmost in the benchmarks).
> > I'd say avoiding the table_32bytes table altogether, using just one of the
> > 4 sequences (with adjusted start) and computing the jump destination in
> > registers shouldn't slow things down.
> 
> The adjustement may cause the slow down. With the jump table, we don't
> need to adjust anything at all for memoy block smaller than 32 bytes.
> That is where the speedup comes from.

I meant instead of
        addl    %ecx, %edx
        addl    %ecx, %esi
do:
	andl	$-4, %ecx
        addl    %ecx, %edx
        addl    %ecx, %esi
or something like that (then you'd just start with -28(%esi) -> for 4
cases).  The %ecx & 3 previous value would need to be preserved till
the end, e.g. in the %ebx register which could be replaced with %eax
and you could hardcode that it jumps to L(28bytes) + 14 * (INDEX / 4).

	Jakub

Follow-Ups:
- Re: PATCH: Optimize memcmp for ia32
  - From: H. J. Lu

References:
- PATCH: Optimize memcmp for ia32
  - From: H. J. Lu
- Re: PATCH: Optimize memcmp for ia32
  - From: Jakub Jelinek
- Re: PATCH: Optimize memcmp for ia32
  - From: H. J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]