This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unicode width data inconsistent/outdated

On Aug  7 11:28, Corinna Vinschen wrote:
> On Aug  5 21:06, Thomas Wolff wrote:
> > Am 04.08.2017 um 19:01 schrieb Corinna Vinschen:
> > > This shouldn't matter to you, just keep it in place.  It's a historical,
> > > low footprint conversion for japanese characters without pulling in the
> > > unicode stuff.  Not used on Cygwin so just ignore.
> > I had noticed meanwhile that this is not active in Cygwin, but it's broken
> > anyway for multiple reasons:
> >    * platforms for which wchar_t is not Unicode should be explicitly listed
> >    * if used, the transformation needs to be applied to all non-Unicode
> > locales (also Chinese, Korean, and even 8-bit locales such as *.CP1252)
> >    * for towupper and towlower, the result must be back-transformed into the
> > respective locale encoding
> >    * particulary the locale-specific _l functions inconsistently do not use
> > the transformation but have this note:
> No, no, no.  The functionality is restricted to certain use-cases and
> always was.  It was a paid-for customer extension back in the day and it
> was *sufficient* for the use-cases.  It's not clear how many newlib
> users are still using it, but it's not a good idea to remove it without
> checking first.  That means, ask on the newlib mailing list how many are
> using the historical jp2uc code, and if we don't get a reply within,
> say, a month, we can probably nuke it.

To clarify where we're coming from:

If you look into newlib/libc/locale/locale.c, function __loadlocale,
you'll notice that outside of Cygwin, only six single/double/multi-bytes
codesets are supported at all:


The multichar/widechar conversion functions for EUCJP, JIS and SJIS were
implemented to have a low footprint in the first place, see, for
instance, __sjis_wctomb in newlib/libc/stdlib/wctomb_r.c.

This is all about simplification for small targets.  There was never a
requirement that converting a UTF-8 char to wchar_t, and converting the
equivalent SJIS char to wchar_t would result in the same wide char.

Consequentially, Cygwin does not use these conversion functions.  Rather
it uses Windows conversion functions, see the conversion functions in
winsup/cygwin/, to get a consistent wide char representation
(UTF-16).  Another side-effect is that Cygwin does not support JIS at
all, only SJIS, see the comment in


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: signature.asc
Description: PGP signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]