This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH] fix surrogate pair handling

From: Isamu Hasegawa <isamu at yamato dot ibm dot com>
To: Ulrich Drepper <drepper at redhat dot com>
Cc: libc-alpha at sources dot redhat dot com, jir at yamato dot ibm dot com, shoji at jp dot ibm dot com, isamu at yamato dot ibm dot com
Date: Wed, 19 Feb 2003 22:36:44 +0900
Subject: [PATCH] fix surrogate pair handling

Hi again,

My co-worker asked me to forward a bug report and a fix to you.
He found that iconv UTF-16 modules doesn't work correctly when converting
specific surrogate pairs.  Converting from UCS-4 to UTF-16 doesn't have
any problems.

 Test case is a following:

$ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | od -bx 
0000000 330 077 337 377
        3fd8 ffdf
0000004

$ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | iconv -f UTF-16BE -t UCS-4BE | od -bx 
iconv: illegal input sequence at position 0

 According to Unicode specification, range of high surrogate(first word)
is U+D800 through U+DBFF and range of low surrogate(last word) is
U+DC00 through U+DFFF.  However, UTF-16 module seems not to respect 
these range.

I attached more detailed test case and a fix to this mail.
How about them?

2003-02-19  Jiro Sekiba  <jir at yamato dot ibm dot com>

	* iconvdata/utf-16.c (gconv_end): Fix range of low surrogate.

Thanks,
-- 
Isamu Hasegawa
IBM Japan, Ltd.

Attachment: utf-16.patch
Description: Binary data

Attachment: utf16.c
Description: Binary data

Follow-Ups:
- Re: [PATCH] fix surrogate pair handling
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]