This is the mail archive of the
mailing list for the Cygwin project.
Re: Non-canonical mode input via tcsetattr(), under mintty console
- From: Thomas Wolff <towo at towo dot net>
- To: cygwin at cygwin dot com
- Date: Sun, 28 Feb 2010 13:24:09 +0100
- Subject: Re: Non-canonical mode input via tcsetattr(), under mintty console
- References: <firstname.lastname@example.org>
Dave Lee schrieb:
Hi all,This is absolutely in line with the specified interface of read(),
whether or not you apply some tcsetattr settings, and whether or not
there is a difference between cygwin console and mintty. It is a
traditional byte-oriented function and has no knowlege or handling of
character encoding, and there is no guarantee that a multi-byte
character comes in one piece. (Even if mintty were changed to try to
feed them in one piece, there would still be no guarantee that you
receive them in one piece.)
I was testing a program that uses non-canonical mode input via
Specifically, I entered the chinese character "ä" (which means "rule"
or "example"). It occupies 3 bytes in UTF-8 representation: E4, BE, 8B.
On standard console, the read() call returned THREE bytes (n == 3), and
(not surprisingly) E4, BE and 8B were returned to buf.
On mintty console, the read() call returned ONE byte (n == 1), and only
E4 were returned to buf. I could grab the other two bytes if I did
additional calls to read().
You have four options (two each whether you want UTF-8 or Unicode words
in your program):
* Read bytes and decode UTF-8 yourself. Basically simple as long as you
are careful to avoid errors.
* Read bytes and transform with one of the mbtowc (multi-byte to
wide-character) functions (provided you want characters as Unicode
words, not UTF-8 sequences in your program). The interface of those
functions is a little bit tricky, though.
* Use wide character input functions (e.g. from the ncursesw library)
(provided... see above). They may not be completely flexible with
respect to specific interaction requirements (tcsetattr settings...),
though, I'm not sure.
* Use wide character input functions and transform back to UTF-8 with
wctomb functions, if you need.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple