This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi again, My co-worker asked me to forward a bug report and a fix to you. He found that iconv UTF-16 modules doesn't work correctly when converting specific surrogate pairs. Converting from UCS-4 to UTF-16 doesn't have any problems. Test case is a following: $ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | od -bx 0000000 330 077 337 377 3fd8 ffdf 0000004 $ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | iconv -f UTF-16BE -t UCS-4BE | od -bx iconv: illegal input sequence at position 0 According to Unicode specification, range of high surrogate(first word) is U+D800 through U+DBFF and range of low surrogate(last word) is U+DC00 through U+DFFF. However, UTF-16 module seems not to respect these range. I attached more detailed test case and a fix to this mail. How about them? 2003-02-19 Jiro Sekiba <jir at yamato dot ibm dot com> * iconvdata/utf-16.c (gconv_end): Fix range of low surrogate. Thanks, -- Isamu Hasegawa IBM Japan, Ltd.
Attachment:
utf-16.patch
Description: Binary data
Attachment:
utf16.c
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |