This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] faster strlen on x64


On Thu, Jan 31, 2013 at 03:40:44PM +0400, Dmitrieva Liubov wrote:
> Looks good to me.
> I don't see format issues for this version.
> 
> Do you have strnlen performance data as your patch impacts strnlen also?
>
I have problem that no program that I use calls strnlen. 
Instead I decided to modify my profiler. I gathered data from strlen by
size_t strlen (const char *x)
  {
    return strnlen (x,1L<<60);
  } 
It assumes that limit in strnlen is newer reached. Profiler and data are at

http://kam.mff.cuni.cz/~ondra/strnlen_profile.tar.bz2
http://kam.mff.cuni.cz/~ondra/stnrlen_results.tar.bz2

New implementation is on my sandy bridge faster on all but instruction cache
test.

Answer what is better depends if programs that use strnlen use it often or only
occasionaly.

Alternative if ifunc modification support is added is to switch to have
simple implementation and switch faster after say 256 calls.

> Can you please extract short performance review like average gain for
> AMD, Atom, SNB, IVX, Haswell in %.
> 
> --
> Liubov Dmitrieva
> Software Engineer
> Intel Corporation
> 
> 2013/1/31 OndÅej BÃlka <neleai@seznam.cz>:
> > Hi,
> >
> > Afetr testing by Liuba I prepared final version of my patch
> > (attached and on neleai/strlen branch.).
> >
> > I used hooking to examine behaviour of implementations in wild, it can be
> > downloaded on http://kam.mff.cuni.cz/~ondra/strlen_profile.tar.bz2
> > (Run ./benchmarks for unit tests, read TODO as it is not complete.)
> >
> > No aditional failures on x64.
> >
> > Uses of strlen_* in strcat are inlined for now, optimizations will come
> > after I deal with strcpy.
> >
> > It could be also use in linker, I split this functionality into
> > additional patch.
> >
> > Ondra
> >
> > 2013-01-31  Ondrej Bilka  <neleai@seznam.cz>
> >
> >         * sysdeps/x86_64/strlen.S: Replace with new SSE2 based
> >         implementation which is faster on all x86_64 architectures.
> >         Tested on AMD, Intel Nehalem, Atom, SNB, IVB, Haswell.
> >         * sysdeps/x86_64/strnlen.S: Likewise.
> >
> >         * sysdeps/x86_64/multiarch/Makefile (sysdep_routines):
> >         Remove all multiarch strlen and strnlen versions.
> >         * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update.
> >         Remove strlen and strnlen related parts.
> >
> >         * sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Update.
> >         Inline strlen part.
> >         * sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise.
> >
> >         * sysdeps/x86_64/multiarch/strlen.S: Remove.
> >         * sysdeps/x86_64/multiarch/strlen-sse2-no-bsf.S: Remove.
> >         * sysdeps/x86_64/multiarch/strlen-sse2-pminub.S: Remove.
> >         * sysdeps/x86_64/multiarch/rtld-strlen.S: Remove.
> >         * sysdeps/x86_64/multiarch/strlen-sse4.S: Remove.
> >         * sysdeps/x86_64/multiarch/strnlen.S: Remove.
> >         * sysdeps/x86_64/multiarch/strnlen-sse2-no-bsf.S: Remove.

-- 

You need to install an RTFM interface.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]