This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2][BZ #16640] Remove strtok assembly implementation.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 14 Mar 2014 11:37:09 +0100
- Subject: Re: [PATCH 2/2][BZ #16640] Remove strtok assembly implementation.
- Authentication-results: sourceware.org; auth=none
- References: <20140227123238 dot GA26291 at domone dot podge> <20140227124206 dot GA26474 at domone dot podge> <5318A03D dot 3000705 at redhat dot com> <20140306163241 dot GA11843 at domone dot podge> <5318B58B dot 5040704 at redhat dot com> <20140306205212 dot GB11843 at domone dot podge> <53192422 dot 2050101 at redhat dot com> <20140310212923 dot GG6407 at domone dot podge> <53214E1B dot 6040601 at redhat dot com> <20140314102627 dot GC13535 at domone dot podge>
On Fri, Mar 14, 2014 at 11:26:27AM +0100, OndÅej BÃlka wrote:
> On Thu, Mar 13, 2014 at 02:20:11AM -0400, Carlos O'Donell wrote:
> > On 03/10/2014 05:29 PM, OndÅej BÃlka wrote:
> > > On Thu, Mar 06, 2014 at 08:42:58PM -0500, Carlos O'Donell wrote:
> > >> On 03/06/2014 03:52 PM, OndÅej BÃlka wrote:
> > >>> On Thu, Mar 06, 2014 at 12:51:07PM -0500, Carlos O'Donell wrote:
> > >>>> On 03/06/2014 11:32 AM, OndÅej BÃlka wrote:
> > >>>>> On Thu, Mar 06, 2014 at 11:20:13AM -0500, Carlos O'Donell wrote:
> > >>>>>> On 02/27/2014 07:42 AM, OndÅej BÃlka wrote:
> > >>>>>>> As followup this patch removes strtok assembly implementation as it is
> > >>>>>>> around 2-4 times slower on sse4_2 capable machines which is a majority,
> > >>>>>>> as previous benchtest demonstrated.
> > >>>>>>>
> > >>>>>>> You could try to gain some cycles on older machines by using ifuncs but
> > >>>>>>> is a strtok important enough to complicate matters?
> > >>>>>>>
> > >>>>>>> OK to commit?
> > >>>>>>>
> > >>>>>>> * sysdeps/i386/i686/strtok.S: Remove.
> > >>>>>>> * sysdeps/i386/i686/strtok_r.S: Likewise.
> > >>>>>>> * sysdeps/i386/strtok.S: Likewise.
> > >>>>>>> * sysdeps/i386/strtok_r.S: Likewise.
> > >>>>>>> * sysdeps/x86_64/strtok.S: Likewise.
> > >>>>>>> * sysdeps/x86_64/strtok_r.S: Likewise.
> > >>>>>>
> > >>>>>> Is this still the case with the oldest compiler we support?
> > >>>>>>
> > >>>>> I do not understand what are you reffering to.
> > >>>>
> > >>>> If you remove the strtok assembly implemetnation, the build will
> > >>>> fall back on use the C implementation e.g. string/strtok.c.
> > >>>> That C implementation will have been compiled by whichever
> > >>>> compiler you're using for your build. In glibc we support a wide
> > >>>> range of compilers from gcc 4.4 and newer. Do the older compilers,
> > >>>> specifically gcc 4.4, still generate good code for strtok.c which
> > >>>> is faster than strtok.S?
> > >>>>
> > >>> There is nothing to optimize, strtok code consist from two function
> > >>> calls and some ifs and I do not see optimization oppurtinities here.
> > >>>
> > >>
> > >> The problem I'm seeing is that we are removing assembly versions
> > >> which might be faster than the code generated by the oldest and
> > >> possible worst performing compiler that we support (assuming that
> > >> gcc gets better with each revision).
> > >>
> > >> Could you please test with gcc 4.4 and see if it still generates
> > >> 2-4x faster strtok or if the assembly version is faster?
> > >>
> > > When I tested this I found no difference in compilers but found that
> > > speedup was caused by inlining. There is little difference on
> > > implementations when I do not inline it.
> >
> > Are you saying then that the resulting code generated with gcc 4.4
> > is approximately the same performance as that generated with a
> > newer gcc, both of which are 2-4x faster than the assembly
> > implementation?
> >
> > If that's the case then please commit your patch.
> >
> Not exactly. when I tested compilers strtok was slower because plt
> avoidance called default not sse4_2 one. It needs additional patch to
> remove hidden_def.
Second problem was that x86 string inlines kicked in here and they are
ineffective as they are. I would remove them for now.