This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speed up fnmatch.


On Thu, Oct 17, 2013 at 05:45:36PM +0200, OndÅej BÃlka wrote:
> ping
> On Wed, Oct 09, 2013 at 07:43:59PM +0200, OndÅej BÃlka wrote:
> > Hi, fnmatch on multibyte string is slow for several reasons.
> > Main reason is that we spend ages converting strings to wide
> > strings.
> > 
> > Most of time this overhead is not needed as user does a simple search
> > with no special characters. 
> > 
> > For utf8 these could be checked by strstr as there is only one way how
> > encode ascii-only sequence. I could generalize this to any encoding for
> > which this is true.
> > 
> > A strstr will quickly determine that there is no match because there is
> > no match with start.
> > 
> > I excluded all special characters from documentation and it passes test.
> > 
> > OK to commit?

I don't think so without significant further review.

> > 
> > 	* posix/fnmatch.c: Optimize likely case.
> > 
> > diff --git a/posix/fnmatch.c b/posix/fnmatch.c
> > index 0f26a2e..27aec2e 100644
> > --- a/posix/fnmatch.c
> > +++ b/posix/fnmatch.c
> > @@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
> >       int flags;
> >  {
> >  # if HANDLE_MULTIBYTE
> > -  if (__builtin_expect (MB_CUR_MAX, 1) != 1)
> > +  int multibyte_needed = 0;
> > +  if (MB_CUR_MAX != 1)
> > +    {
> > +      char *encoding = nl_langinfo (CODESET);
> > +      if (strcmp (encoding, "UTF8"))

The CODESET string is "UTF-8", not "UTF8", so you were not testing
whatever codepath you intended to take place for UTF-8...

> > +  if (!multibyte_needed)
> > +    {
> > +      char start[16];
> > +      size_t i;
> > +      for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
> > +        start[i] = pattern[i];
> > +      start[i] = 0;
> > +      char *string2 = strstr (string, start);
> > +      if (!string2)
> > +        return FNM_NOMATCH;

While it's better than the current approach of unnecessary conversion,
I think strstr is still highly suboptimal in some corner cases. This
code should all eventually be rewritten, but for now your solution is
probably better than nothing. It does need more review for correctness
though.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]