This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unicode 3.2 support (6)


Hello Bruno,

First of all, thank you for your great effort in updating the glibc
charmaps and iconv converters to Unicode 3.2.  :-)

One thing special about BIG5-HKSCS and HKSCS-2001 on GNU/Linux though:
there are still quite a few important components (and HKSCS fonts)
which do not yet fully support the HKSCS-2001 and ISO 10646-2:2001
standards, specifically mappings beyond the BMP.  (XFree86, Qt, and
most existing TrueType Big5-HKSCS fonts on the market, etc.) So, while
mapping HKSCS-2001 to ISO 10646-2:2001 is definitely desirable, it is
not time yet because the rest of the system is not ready.  That will
probably have to wait until 2003 or 2004.  We have discussed with Mr.
Andrew Fung of the ITSD (HKSARG department responsible for the
HKSCS-2001 implementation), and he agrees with this migration plan.

So, in the interim, please consider using the following scheme for the
default BIG5-HKSCS charmap/converter:

    BIG5-HKSCS --> ISO 10646-1:2000 + PUA

    PUA + ISO 10646-1:2000 \___\  BIG5-HKSCS
          ISO 10646-2:2001 /   /

or perhaps make two versions of "BIG5-HKSCS" in glibc:
say "BIG5-HKSCS-1999" which maps BIG5-HKSCS to ISO 10646-1:2000+PUA,
and "BIG5-HKSCS-2001" which maps BIG5-HKSCS to ISO 10646-2:2001 too,
and "BIG5-HKSCS" an alias of "BIG5-HKSCS-1999", and switch the alias
to "BIG5-HKSCS-2001" a year or two from now.  :-)

There is another intricacy with BIG5-HKSCS with unified characters,
in big5cmp.txt.  If you like, please take a look at:

	http://www.thizlinux.com/~anthony/hkscs/

James had made great progress on this area too, and and he made a
big5hkscs.c that is very memory efficient, as well as a CHARMAP.  We
haven't posted here yet about our patches mainly because of a lack of
time to optimize the CHARMAP further and to provide the BIG5-HKSCS
test data.  (Okay, and because I was lazy too... or, kekem, I had to
meet other deadlines.  ;-)

But yes, we'll continue with this discussion further.  :-)

Cheers,

Anthony

On Wed, Apr 17, 2002 at 04:47:23PM +0200, Bruno Haible wrote:
> 
> Here is a patch to upgrade the BIG5-HKSCS charmap and iconv converter to
> Unicode 3.2. Some characters were added to Unicode 3.1 and Unicode 3.2 for
> better convertibility of EUC-TW (CNS11643), and9~ in Unicode 3.2 furthermore the
> mapping tables between HKSCS and Unicode (in the Unihan.txt file) were updated.
> This patch uses these tables, extracted from Unihan.txt.
> 
> As a consequence of this update, the mapping has one more irreversible mapping
> pair. The testdata/BIG5HKSCS is modified to remove this non-reversibly-mappable
> character. And of course, testdata/BIG5HKSCS..UTF8 is regenerated with the new
> mappings.
> 
> 
> ChangeLog:
> 2002-04-15  Bruno Haible  <bruno@clisp.org>
> 
> 	* iconvdata/big5hkscs.c (big5hkscs_to_ucs): Change element type to
> 	uint32_t. Update to Unicode 3.2.
> 	(from_ucs4, from_ucs4_idx): Update to Unicode 3.2.
> 	(BODY for TO_LOOP): Handle the ASCII range specially.
> 	* iconvdata/BIG5HKSCS.irreversible: Add one more entry.
> 	* iconvdata/testdata/BIG5HKSCS: Remove a character.
> 	* iconvdata/testdata/BIG5HKSCS..UTF-8: Regenerated.
> 
> localedata/ChangeLog:
> 2002-04-15  Bruno Haible  <bruno@clisp.org>
> 
> 	* charmaps/BIG5-HKSCS: Update to Unicode 3.2.
> 
> [The patch is too large for this mailing list. You can download it from
> ftp://ftp.ilog.fr/pub/Users/haible/gnu/glibc-unicode32-patch6.bz2 .]

-- 
Anthony Fok Tung-Ling
ThizLinux Laboratory   <anthony@thizlinux.com> http://www.thizlinux.com/
Debian Chinese Project <foka@debian.org>       http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp!           http://www.olvc.ab.ca/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]