This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speed up fnmatch.


ping
On Wed, Oct 09, 2013 at 07:43:59PM +0200, OndÅej BÃlka wrote:
> Hi, fnmatch on multibyte string is slow for several reasons.
> Main reason is that we spend ages converting strings to wide
> strings.
> 
> Most of time this overhead is not needed as user does a simple search
> with no special characters. 
> 
> For utf8 these could be checked by strstr as there is only one way how
> encode ascii-only sequence. I could generalize this to any encoding for
> which this is true.
> 
> A strstr will quickly determine that there is no match because there is
> no match with start.
> 
> I excluded all special characters from documentation and it passes test.
> 
> OK to commit?
> 
> 	* posix/fnmatch.c: Optimize likely case.
> 
> diff --git a/posix/fnmatch.c b/posix/fnmatch.c
> index 0f26a2e..27aec2e 100644
> --- a/posix/fnmatch.c
> +++ b/posix/fnmatch.c
> @@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
>       int flags;
>  {
>  # if HANDLE_MULTIBYTE
> -  if (__builtin_expect (MB_CUR_MAX, 1) != 1)
> +  int multibyte_needed = 0;
> +  if (MB_CUR_MAX != 1)
> +    {
> +      char *encoding = nl_langinfo (CODESET);
> +      if (strcmp (encoding, "UTF8"))
> +	multibyte_needed = 1;
> +    }
> +
> +  /* ASCII with \+/.*?[{(@! excluded.  */
> +  static unsigned char normal[256] = {
> + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> + 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
> + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
> + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> +
> +  if (!multibyte_needed)
> +    {
> +      char start[16];
> +      size_t i;
> +      for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
> +        start[i] = pattern[i];
> +      start[i] = 0;
> +      char *string2 = strstr (string, start);
> +      if (!string2)
> +        return FNM_NOMATCH;
> +    }
> +  if (MB_CUR_MAX != 1)
>      {
>        mbstate_t ps;
>        size_t n;

-- 

/dev/clue was linked to /dev/null


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]