This is the mail archive of the cygwin-xfree mailing list for the Cygwin XFree86 project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: X11R7.5 and C.UTF-8

From: Thomas Dickey <dickey at his dot com>
To: cygwin-xfree at cygwin dot com
Date: Fri, 4 Dec 2009 04:45:02 -0500 (EST)
Subject: Re: X11R7.5 and C.UTF-8
References: <4AE8539E.9080004@cornell.edu> <20091028172216.P60895@mail101.his.com> <4AE8BC12.1060109@cornell.edu> <416096c60910281507n4774534dode1d24ac47d5b0a2@mail.gmail.com> <4B1115EC.7010308@cornell.edu> <4B174C20.1040900@tlinx.org> <416096c60912022348i36504e14l726efc9fc9c360e6@mail.gmail.com> <20091203045401.L85368@mail101.his.com> <loom.20091203T231736-983@post.gmane.org>
Reply-to: cygwin-xfree at cygwin dot com

On Thu, 3 Dec 2009, Eric Blake wrote:

Thomas Dickey <dickey <at> his.com> writes:

This means that characters 0..127 have to be treated as ASCII, but


No, it means that portable characters and control characters must be < 128.
ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8.  The C
locale also implies that you can manipulate bytes >= 128 in the naive manner,
so long as you don't care about characters embedded in those bytes.  And what
do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too.

beyond that an implementation can do what it wants. And on Cygwin 1.7,
plain "C" actually does imply UTF-8, which happily is
backward-compatible with ASCII.


That's an interpretation that so far hasn't been blessed by the standards
people.  Any discussion of this topic should mention that, as a caveat.


Actually, the standards people HAVE spoken - and they agreed with our
interpretation.  POSIX was INTENTIONALLY written with the intent that a UTF-8
encoding is valid for the C locale, for the same reason that it was written
that an EBCDIC encoding is valid for the C locale.  These emails from the
Austin Group (the folks that write POSIX) are telling:

https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=12982

This is basically your email on the matter.

https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=13012

But they also admitted that there is still more work needed in POSIX to make
this intent clearly codified (for example, that control characters must be
single bytes < 128).

But they have not actually agreed with you yet.

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://x.cygwin.com/docs/
FAQ:                   http://x.cygwin.com/docs/faq/

References:
- Re: X11R7.5 and C.UTF-8
  - From: Linda Walsh
- Re: X11R7.5 and C.UTF-8
  - From: Andy Koppe
- Re: X11R7.5 and C.UTF-8
  - From: Thomas Dickey
- Re: X11R7.5 and C.UTF-8
  - From: Eric Blake

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]