This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
bug in EUC-JP converter and charmap
- To: libc-alpha at sources dot redhat dot com
- Subject: bug in EUC-JP converter and charmap
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 4 Sep 2000 14:49:39 +0200 (CEST)
The iconv converter for EUC-JP converts 0xA1C0 to U+FF3C and 0x8FA2B7 to
U+FF5E, and the EUC-JP charmap table lacks both mappings.
But these two mappings are wrong:
Unicode.org's Mappings/EASTASIA/JIS/JIS0208.TXT maps the first one to U+005C.
Unicode.org's Mappings/EASTASIA/JIS/JIS0212.TXT maps the second one to U+007E.
Here is a patch to
- Change the iconv converter accordingly.
(Yes the roundtrip EUC-JP -> Unicode -> EUC-JP will change
0xA1C0 to 0x5C and 0x8FA2B7 to 0x7E but this is not a problem because
0x5C and 0x7E are unambiguously the "REVERSE SOLIDUS" and "TILDE" in
EUC-JP.)
- Add commented lines to the EUC-JP charmap which the testsuite will
recognize.
2000-09-03 Bruno Haible <haible@clisp.cons.org>
* iconvdata/jis0208.c (__jis0208_to_ucs): Map EUC-JP 0xA1C0 to U+005C.
* iconvdata/jis0212.c (__jisx0212_to_ucs): Map EUC-JP 0x8FA2B7 to
U+007E.
2000-09-03 Bruno Haible <haible@clisp.cons.org>
* charmaps/EUC-JP: Nonreversibly map 0xA1C0 to U+005C and 0x8FA2B7 to
U+007E.
*** glibc-20000831/iconvdata/jis0208.c.bak Tue Sep 7 16:50:40 1999
--- glibc-20000831/iconvdata/jis0208.c Sun Sep 3 11:57:35 2000
***************
*** 67,73 ****
[0x0010] = 0xffe3, [0x0011] = 0xff3f, [0x0012] = 0x30fd, [0x0013] = 0x30fe,
[0x0014] = 0x309d, [0x0015] = 0x309e, [0x0016] = 0x3003, [0x0017] = 0x4edd,
[0x0018] = 0x3005, [0x0019] = 0x3006, [0x001a] = 0x3007, [0x001b] = 0x30fc,
! [0x001c] = 0x2015, [0x001d] = 0x2010, [0x001e] = 0xff0f, [0x001f] = 0xff3c,
[0x0020] = 0x301c, [0x0021] = 0x2016, [0x0022] = 0xff5c, [0x0023] = 0x2026,
[0x0024] = 0x2025, [0x0025] = 0x2018, [0x0026] = 0x2019, [0x0027] = 0x201c,
[0x0028] = 0x201d, [0x0029] = 0xff08, [0x002a] = 0xff09, [0x002b] = 0x3014,
--- 67,73 ----
[0x0010] = 0xffe3, [0x0011] = 0xff3f, [0x0012] = 0x30fd, [0x0013] = 0x30fe,
[0x0014] = 0x309d, [0x0015] = 0x309e, [0x0016] = 0x3003, [0x0017] = 0x4edd,
[0x0018] = 0x3005, [0x0019] = 0x3006, [0x001a] = 0x3007, [0x001b] = 0x30fc,
! [0x001c] = 0x2015, [0x001d] = 0x2010, [0x001e] = 0xff0f, [0x001f] = 0x005c,
[0x0020] = 0x301c, [0x0021] = 0x2016, [0x0022] = 0xff5c, [0x0023] = 0x2026,
[0x0024] = 0x2025, [0x0025] = 0x2018, [0x0026] = 0x2019, [0x0027] = 0x201c,
[0x0028] = 0x201d, [0x0029] = 0xff08, [0x002a] = 0xff09, [0x002b] = 0x3014,
*** glibc-20000831/iconvdata/jis0212.c.bak Tue Sep 7 16:50:46 1999
--- glibc-20000831/iconvdata/jis0212.c Sun Sep 3 11:58:09 2000
***************
*** 111,117 ****
const uint16_t __jisx0212_to_ucs[] =
{
0x02d8, 0x02c7, 0x00b8, 0x02d9, 0x02dd, 0x00af, 0x02db, 0x02da,
! 0xff5e, 0x0384, 0x0385, 0x00a1, 0x00a6, 0x00bf, 0x00ba, 0x00aa,
0x00a9, 0x00ae, 0x2122, 0x00a4, 0x2116, 0x0386, 0x0388, 0x0389,
0x038a, 0x03aa, 000000, 0x038c, 000000, 0x038e, 0x03ab, 000000,
0x038f, 000000, 000000, 000000, 000000, 0x03ac, 0x03ad, 0x03ae,
--- 111,117 ----
const uint16_t __jisx0212_to_ucs[] =
{
0x02d8, 0x02c7, 0x00b8, 0x02d9, 0x02dd, 0x00af, 0x02db, 0x02da,
! 0x007e, 0x0384, 0x0385, 0x00a1, 0x00a6, 0x00bf, 0x00ba, 0x00aa,
0x00a9, 0x00ae, 0x2122, 0x00a4, 0x2116, 0x0386, 0x0388, 0x0389,
0x038a, 0x03aa, 000000, 0x038c, 000000, 0x038e, 0x03ab, 000000,
0x038f, 000000, 000000, 000000, 000000, 0x03ac, 0x03ad, 0x03ae,
*** glibc-20000831/localedata/charmaps/EUC-JP.bak Wed Jul 12 18:11:45 2000
--- glibc-20000831/localedata/charmaps/EUC-JP Sun Sep 3 12:01:03 2000
***************
*** 276,281 ****
--- 276,282 ----
<U2015> /xa1/xbd HORIZONTAL BAR
<U2010> /xa1/xbe HYPHEN
<UFF0F> /xa1/xbf FULLWIDTH SOLIDUS
+ %IRREVERSIBLE%<U005C> /xa1/xc0 REVERSE SOLIDUS
<U301C> /xa1/xc1 WAVE DASH
<U2016> /xa1/xc2 DOUBLE VERTICAL LINE
<UFF5C> /xa1/xc3 FULLWIDTH VERTICAL LINE
***************
*** 7135,7140 ****
--- 7136,7142 ----
<U00AF> /x8f/xa2/xb4 MACRON
<U02DB> /x8f/xa2/xb5 OGONEK
<U02DA> /x8f/xa2/xb6 RING ABOVE
+ %IRREVERSIBLE%<U007E> /x8f/xa2/xb7 TILDE
<U0384> /x8f/xa2/xb8 GREEK TONOS
<U0385> /x8f/xa2/xb9 GREEK DIALYTIKA TONOS
<U00A1> /x8f/xa2/xc2 INVERTED EXCLAMATION MARK