Trouble with character sets
Brian Inglis
Brian.Inglis@SystematicSw.ab.ca
Mon Aug 3 16:31:15 GMT 2020
On 2020-08-03 09:36, Michael Shay via Cygwin wrote:
> I'm having a problem with Cygwin 3.1.4, changing the character set on the
> fly. It seems to work with Cygwin applications, but not with Win32
> applications.
> I have a Korn shell script:
> #!/bin/ksh
> OLD_LANG="$LANG"
> OLD_LC_ALL="$LC_ALL"
> echo "locale on entry"
> locale
> echo ""
> export LANG="en_US.CP1252"
> export LC_ALL=en_US.CP1252
> echo "locale changed to"
> locale
> echo ""
> # Default is to run the Win32 program. Input any argument other than
> 'WIN32'
> # to run '/bin/echo'.
> case $# in
> 0 ) echo "Running WIN32 pgm"
> ksh -c 'cygtest.exe ZÇ'
> ;;
> 1 ) echo "Running Cygwin 'echo'"
> ksh -c '/bin/echo ZÇ'
> ;;
> 2 ) echo "Running WIN32 pgm"
> ksh -c 'cygtest.exe ZÇ'
> echo ""
> echo "Running Cygwin 'echo'"
> ksh -c '/bin/echo ZÇ'
> ;;
> * ) ;;
> esac
> LC_ALL="$OLD_LC_ALL"
> LANG="$OLD_LANG"
> and a Win32 application (attached file cygtest.cpp)
> I used gdb to see what was happening in child_info_spawn::worker(), when a
> Win32 program is started using:
> rc = CreateProcessW (runpath, /* image name w/ full path */
> cmd.wcs (wcmd), /* what was passed to exec */
> sa, /* process security attrs */
> sa, /* thread security attrs */
> TRUE, /* inherit handles */
> c_flags,
> envblock, /* environment */
> NULL,
> &si,
> &pi);
> Specifically, 'cmd.wcs(wcmd)' invokes:
> wchar_t *wcs (wchar_t *wbuf, size_t n)
> {
> if (n == 1)
> wbuf[0] = L'\0';
> else
> sys_mbstowcs (wbuf, n, buf);
> return wbuf;
> }
> and sys_mbstowcs():
> size_t __reg3
> sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
> {
> mbtowc_p f_mbtowc = __MBTOWC;
> if (f_mbtowc == __ascii_mbtowc)
> {
> f_mbtowc = __utf8_mbtowc; <<<<< this
> is ALWAYS done, no matter what charset is in use.
> }
> return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
> }
> Since the CP1252 is an 8-bit single-byte character set with characters >=
> 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the
> '0xf0' byte indicating an invalid character in the string.
> This doesn't seem to happen when e.g. '/bin/echo' is run, although I
> haven't stepped into the code to see what's happening.
> I do not think this is a Cygwin bug, but since the User's Guide says the
> locale and charset can be changed on the fly, I don't know what's going
> awry.
> Any suggestions? If you need more information, I'm happy to provide it.
Try:
$ chcp.com
Active code page: 850
$ chcp.com 65001
Active code page: 65001
$ chcp.com
Active code page: 65001
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
More information about the Cygwin
mailing list