This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/16061] New: Review / update transliteration data
- From: "myllynen at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Fri, 18 Oct 2013 08:04:53 +0000
- Subject: [Bug localedata/16061] New: Review / update transliteration data
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=16061
Bug ID: 16061
Summary: Review / update transliteration data
Product: glibc
Version: 2.18
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: myllynen at redhat dot com
CC: libc-locales at sourceware dot org
The localedata/locales/translit_* files are probably, based on comments in
them, at least partially generated from some version of UnicodeData.txt (based
on 93a568 it looks like the last major update has been for Unicode 3.2 and
17b16e suggests them originally coming from an external contributor). However,
there are some characters missing even from the Latin-1 Supplement block and in
general it doesn't seem possible to update the files just by using
UnicodeData.txt. Some of the rules live in locale/C-translit.h /
locale/C-translit.h.in which also contain local changes (like 61d5a6 / 2a81ea).
It requires likely a lot of work to understand how the files have been
generated in the first place, how to identify relevant local changes, and how
to automate the process to update them in the future.
Some individual examples of currently missing characters are U+00D8 (Ã) and
U+0110 (Ä) whereas other characters like U+00C6 (Ã) and U+0141 (Å) from their
blocks (Latin-1 Supplement and Latin Extended-A, respectively) are present.
Some characters (like U+2033, â) have decomposition defined as is in Unicode
but some characters (like U+00D6, Ã) have decomposition defined in Unicode but
not in glibc.
--
You are receiving this mail because:
You are on the CC list for the bug.