This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Speed-up character range regexes by up to 2x



What follows the review of the "gawk guy"'s regex patch:

> +#ifdef RE_ENABLE_I18N
>    int icase = (dfa->mb_cur_max == 1 && (bufp->syntax & RE_ICASE));
> +#else
> +  int icase = (bufp->syntax & RE_ICASE);
> +#endif

This is unneeded.


I mean that you probably added these fixes when MB_CUR_MAX was used because somebody got link errors for MB_CUR_MAX undefined; but now, dfa->mb_cur_max cannot possibly be undefined and will always be 1 if !RE_ENABLE_I18N.

@@ -2558,8 +2564,8 @@
                ? __btowc (start_ch) : start_elem->opr.wch);
     end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
              ? __btowc (end_ch) : end_elem->opr.wch);
-    cmp_buf[0] = start_wc;
-    cmp_buf[4] = end_wc;
+    cmp_buf[0] = start_wc != WEOF ? start_wc : start_ch;
+    cmp_buf[4] = end_wc != WEOF ? end_wc : end_ch;
     if (wcscoll (cmp_buf, cmp_buf + 4) > 0)
       return REG_ERANGE;

I am not sure this is the fix; maybe it is better not to include the character set if start_wc == WEOF || end_wc == WEOF, or to return REG_ERANGE?


I got WEOF and things core dumped (or equivalent).


Yes, it seems to me too that the case must be handled.

Paolo



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]