This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Speed up fnmatch.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: libc-alpha at sourceware dot org
- Date: Thu, 17 Oct 2013 17:45:36 +0200
- Subject: Re: [PATCH] Speed up fnmatch.
- Authentication-results: sourceware.org; auth=none
- References: <20131009174359 dot GA31917 at domone dot podge>
ping
On Wed, Oct 09, 2013 at 07:43:59PM +0200, OndÅej BÃlka wrote:
> Hi, fnmatch on multibyte string is slow for several reasons.
> Main reason is that we spend ages converting strings to wide
> strings.
>
> Most of time this overhead is not needed as user does a simple search
> with no special characters.
>
> For utf8 these could be checked by strstr as there is only one way how
> encode ascii-only sequence. I could generalize this to any encoding for
> which this is true.
>
> A strstr will quickly determine that there is no match because there is
> no match with start.
>
> I excluded all special characters from documentation and it passes test.
>
> OK to commit?
>
> * posix/fnmatch.c: Optimize likely case.
>
> diff --git a/posix/fnmatch.c b/posix/fnmatch.c
> index 0f26a2e..27aec2e 100644
> --- a/posix/fnmatch.c
> +++ b/posix/fnmatch.c
> @@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
> int flags;
> {
> # if HANDLE_MULTIBYTE
> - if (__builtin_expect (MB_CUR_MAX, 1) != 1)
> + int multibyte_needed = 0;
> + if (MB_CUR_MAX != 1)
> + {
> + char *encoding = nl_langinfo (CODESET);
> + if (strcmp (encoding, "UTF8"))
> + multibyte_needed = 1;
> + }
> +
> + /* ASCII with \+/.*?[{(@! excluded. */
> + static unsigned char normal[256] = {
> + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> + 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
> + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
> + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> +
> + if (!multibyte_needed)
> + {
> + char start[16];
> + size_t i;
> + for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
> + start[i] = pattern[i];
> + start[i] = 0;
> + char *string2 = strstr (string, start);
> + if (!string2)
> + return FNM_NOMATCH;
> + }
> + if (MB_CUR_MAX != 1)
> {
> mbstate_t ps;
> size_t n;
--
/dev/clue was linked to /dev/null