Trouble with character sets
Michael Shay
MShay@ABINITIO.COM
Mon Aug 3 15:36:14 GMT 2020
I'm having a problem with Cygwin 3.1.4, changing the character set on the
fly. It seems to work with Cygwin applications, but not with Win32
applications.
I have a Korn shell script:
#!/bin/ksh
OLD_LANG="$LANG"
OLD_LC_ALL="$LC_ALL"
echo "locale on entry"
locale
echo ""
export LANG="en_US.CP1252"
export LC_ALL=en_US.CP1252
echo "locale changed to"
locale
echo ""
# Default is to run the Win32 program. Input any argument other than
'WIN32'
# to run '/bin/echo'.
case $# in
0 ) echo "Running WIN32 pgm"
ksh -c 'cygtest.exe ZÇ'
;;
1 ) echo "Running Cygwin 'echo'"
ksh -c '/bin/echo ZÇ'
;;
2 ) echo "Running WIN32 pgm"
ksh -c 'cygtest.exe ZÇ'
echo ""
echo "Running Cygwin 'echo'"
ksh -c '/bin/echo ZÇ'
;;
* ) ;;
esac
LC_ALL="$OLD_LC_ALL"
LANG="$OLD_LANG"
and a Win32 application (attached file cygtest.cpp)
I used gdb to see what was happening in child_info_spawn::worker(), when a
Win32 program is started using:
rc = CreateProcessW (runpath, /* image name w/ full path */
cmd.wcs (wcmd), /* what was passed to exec */
sa, /* process security attrs */
sa, /* thread security attrs */
TRUE, /* inherit handles */
c_flags,
envblock, /* environment */
NULL,
&si,
&pi);
Specifically, 'cmd.wcs(wcmd)' invokes:
wchar_t *wcs (wchar_t *wbuf, size_t n)
{
if (n == 1)
wbuf[0] = L'\0';
else
sys_mbstowcs (wbuf, n, buf);
return wbuf;
}
and sys_mbstowcs():
size_t __reg3
sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
{
mbtowc_p f_mbtowc = __MBTOWC;
if (f_mbtowc == __ascii_mbtowc)
{
f_mbtowc = __utf8_mbtowc; <<<<< this
is ALWAYS done, no matter what charset is in use.
}
return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
}
Since the CP1252 is an 8-bit single-byte character set with characters >=
0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the
'0xf0' byte indicating an invalid character in the string.
This doesn't seem to happen when e.g. '/bin/echo' is run, although I
haven't stepped into the code to see what's happening.
I do not think this is a Cygwin bug, but since the User's Guide says the
locale and charset can be changed on the fly, I don't know what's going
awry.
Any suggestions? If you need more information, I'm happy to provide it.
Mike Shay
Here's the source for the Win32 program. I built it with Visual Studio
2015, to get something running quickly.
NOTICE from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygtest.cpp
Type: application/octet-stream
Size: 4428 bytes
Desc: not available
URL: <https://cygwin.com/pipermail/cygwin/attachments/20200803/b38fc8ec/attachment.obj>
More information about the Cygwin
mailing list