This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/13061] New: iconv mapping of 0xA8 0xEC in CP1258 isnon-canonical
- From: "bruno at clisp dot org" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sources dot redhat dot com
- Date: Sat, 6 Aug 2011 16:30:51 +0000
- Subject: [Bug localedata/13061] New: iconv mapping of 0xA8 0xEC in CP1258 isnon-canonical
- Auto-submitted: auto-generated
http://sourceware.org/bugzilla/show_bug.cgi?id=13061
Summary: iconv mapping of 0xA8 0xEC in CP1258 is non-canonical
Product: glibc
Version: 2.14
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: libc-locales@sources.redhat.com
ReportedBy: bruno@clisp.org
Bug 12777 <http://sourceware.org/bugzilla/show_bug.cgi?id=12777>
was fixed to map U+0385 (like U+1FEE) to 0xA8 0xEC. Good.
But at the same time, in the reverse direction, 0xA8 0xEC ought to map to
U+0385, not to U+1FEE. Why?
1) http://www.unicode.org/charts/PDF/U1F00.pdf states
that the decomposition of U+1FEE is U+0385. That is, U+0385 is a "simpler"
Unicode character than U+1FEE, although both look very similar
(cf. http://www.unicode.org/charts/PDF/U1F00.pdf and
http://www.unicode.org/charts/PDF/U0370.pdf).
2) According to http://www.unicode.org/versions/Unicode6.0.0/ch07.pdf,
the block U+0370..U+03FF is more for modern Greek, whereas the block
U+1F00..U+1FFF is mostly for ancient Greek. But CP1258 is about modern Greek.
To reproduce:
$ printf '\xA8\xEC' | iconv -f CP1258 -t UCS-4LE | od -t x4
0000000 00001fee
0000004
Should be:
$ printf '\xA8\xEC' | iconv -f CP1258 -t UCS-4LE | od -t x4
0000000 00000385
0000004
Attached is probable fix (untested).
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.