This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][TRY4][BZ #13637] fix false multi-byte matches in someregular expressions + testcase


On Fri, Feb 24, 2012 at 3:28 PM, Stanislav Brabec <sbrabec@suse.cz> wrote:
> [This patch contains Japanese UTF-8 characters in C code comments. Please use http://sourceware.org/bugzilla/show_bug.cgi?id=13637#c3 in case of problems.]
> * posix/regex_internal.c (re_string_skip_chars): Fix miscomputation
> of remain_len that may cause incomplete multi-byte character and
> false match.
> * posix/bug-regex33.c: New file.
> * posix/Makefile (tests): Add bug-regex33.

Excellent work!

I applaud your use of UTF-8 characters in C code comments, it makes
working through the bug that much easier (obviously you need a reader
that has the character set, and I do).

I've gone through your patch and testcase and I think it's correct.
The code is very complex and unfortunately not commented in detail.

The use of pstr->len is restricted to keeping track of the converted
length of pstr and should therefore not be used when processing the
string during character skipping as is the case of
re_string_skip_chars. In the case of
re_string_skip_chars you want to use the original raw length of the
string as you convert the raw multibyte string into wide characters.

See below for 3 comments.

> ---
> ÂChangeLog       Â|  Â9 ++++
> Âposix/Makefile     |  Â3 +-
> Âposix/bug-regex33.c  Â| Â116 ++++++++++++++++++++++++++++++++++++++++++++++++
> Âposix/regex_internal.c | Â Â2 +-
> Â4 files changed, 128 insertions(+), 2 deletions(-)
> Âcreate mode 100644 posix/bug-regex33.c
>
> diff --git a/ChangeLog b/ChangeLog
> index 153d80f..8e10109 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,12 @@
> +2012-02-24 ÂStanislav Brabec Â<sbrabec@suse.cz>
> +
> + Â Â Â [BZ #13637]
> + Â Â Â * posix/regex_internal.c (re_string_skip_chars): Fix miscomputation
> + Â Â Â of remain_len that may cause incomplete multi-byte character and
> + Â Â Â false match.
> + Â Â Â * posix/bug-regex33.c: New file.
> + Â Â Â * posix/Makefile (tests): Add bug-regex33.
> +
> Â2012-02-22 ÂJoseph Myers Â<joseph@codesourcery.com>
>
> Â Â Â Â[BZ #2547]
> diff --git a/posix/Makefile b/posix/Makefile
> index ba892f1..565861f 100644
> --- a/posix/Makefile
> +++ b/posix/Makefile
> @@ -82,7 +82,7 @@ tests     := tstgetopt testfnm runtests runptests   Â\
> Â Â Â Â Â Â Â Â Â bug-regex21 bug-regex22 bug-regex23 bug-regex24 \
> Â Â Â Â Â Â Â Â Â bug-regex25 bug-regex26 bug-regex27 bug-regex28 \
> Â Â Â Â Â Â Â Â Â bug-regex29 bug-regex30 bug-regex31 bug-regex32 \
> - Â Â Â Â Â Â Â Â Âtst-nice tst-nanosleep tst-regex2 \
> + Â Â Â Â Â Â Â Â Âbug-regex33 tst-nice tst-nanosleep tst-regex2 \
> Â Â Â Â Â Â Â Â Â transbug tst-rxspencer tst-pcre tst-boost \
> Â Â Â Â Â Â Â Â Â bug-ga1 tst-vfork1 tst-vfork2 tst-vfork3 tst-waitid \
> Â Â Â Â Â Â Â Â Â tst-getaddrinfo2 bug-glob1 bug-glob2 bug-glob3 tst-sysconf \
> @@ -208,6 +208,7 @@ bug-regex25-ENV = LOCPATH=$(common-objpfx)localedata
> Âbug-regex26-ENV = LOCPATH=$(common-objpfx)localedata
> Âbug-regex30-ENV = LOCPATH=$(common-objpfx)localedata
> Âbug-regex32-ENV = LOCPATH=$(common-objpfx)localedata
> +bug-regex33-ENV = LOCPATH=$(common-objpfx)localedata
> Âtst-rxspencer-ARGS = --utf8 rxspencer/tests
> Âtst-rxspencer-ENV = LOCPATH=$(common-objpfx)localedata
> Âtst-pcre-ARGS = PCRE.tests
> diff --git a/posix/bug-regex33.c b/posix/bug-regex33.c
> new file mode 100644
> index 0000000..0dc65c2
> --- /dev/null
> +++ b/posix/bug-regex33.c
> @@ -0,0 +1,116 @@
> +/* Test re_search with multi-byte characters in EUC-JP.
> + Â Copyright (C) 2012 Free Software Foundation, Inc.
> + Â This file is part of the GNU C Library.
> + Â Contributed by Stanislav Brabec <sbrabec@suse.cz>, 2012.
> +
> + Â The GNU C Library is free software; you can redistribute it and/or
> + Â modify it under the terms of the GNU Lesser General Public
> + Â License as published by the Free Software Foundation; either
> + Â version 2.1 of the License, or (at your option) any later version.
> +
> + Â The GNU C Library is distributed in the hope that it will be useful,
> + Â but WITHOUT ANY WARRANTY; without even the implied warranty of
> + Â MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ÂSee the GNU
> + Â Lesser General Public License for more details.
> +
> + Â You should have received a copy of the GNU Lesser General Public
> + Â License along with the GNU C Library; if not, write to the Free
> + Â Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
> + Â 02111-1307 USA. Â*/
> +
> +#define _GNU_SOURCE 1
> +#include <locale.h>
> +#include <regex.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +int
> +main (void)
> +{
> + Âstruct re_pattern_buffer r;
> + Âstruct re_registers s;
> + Âint e, rc = 0;
> + Âif (setlocale (LC_CTYPE, "ja_JP.EUC-JP") == NULL)
> + Â Â{
> + Â Â Âputs ("setlocale failed");
> + Â Â Âreturn 1;
> + Â Â}
> + Âmemset (&r, 0, sizeof (r));
> + Âmemset (&s, 0, sizeof (s));
> +
> + Â Â Â Â Â Â Â Â Â Â/* å */
> + Âre_compile_pattern ("\xb7\xbd", 2, &r);
> +
> + Â/* The bug cannot be reproduced without initialized fastmap. */
> + Âr.fastmap = malloc (1 << (sizeof (char) * 8));

(1) Use test-skeleton.c for all new test cases please.

(2) Use SBC_MAX?

Can we use SBC_MAX instead of `1 << (sizeof (char) * 8)'?

(3) Setup fastmap before call to re_compile_pattern.

Could you please set the fastmap *before* compiling the pattern.
It shouldn't make any difference since the fastmap is recomputed
at the first call to re_search when ->fastmap_accurate is zero.
However, I'd like the test case to follow the documentation.
The documentation does say you must initialize the fastmap, and
*then* talks about re_compile_pattern. I fully understand that
the documentation is in need of clarification.

OK to checkin to trunk after fixing (1), (2) and (3).

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]