This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console
- From: Brian Inglis <Brian dot Inglis at SystematicSw dot ab dot ca>
- To: cygwin at cygwin dot com
- Date: Fri, 30 Sep 2016 23:15:02 -0600
- Subject: Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console
- Authentication-results: sourceware.org; auth=none
- References: <123291584.20161001051347@vanav.org> <f4712f19-ef37-2040-1cda-3e352f09c8cd@SystematicSw.ab.ca>
- Reply-to: Brian dot Inglis at SystematicSw dot ab dot ca
On 2016-09-30 22:34, Brian Inglis wrote:
On 2016-09-30 20:13, Ivan Vanyushkin wrote:
Something has changed in version 2.6.0, and now UTF-8 text can't be displayed in Windows console (cmd).
1. Create a file "test.txt" with non-ASCII text in UTF-8 encoding.
2. Run "cmd".
3. Run:
C:\Cygwin\bin\cat test.txt
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒ ▒▒▒▒▒▒ 8000 ▒▒. ▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒.
Non-ASCII text is not readable. Older Cygwin 2.5.2 has no such issue.
C:\Cygwin\bin\uname -a
CYGWIN_NT-10.0 PCName 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin
C:\Cygwin\bin\locale
LANG=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=
Same issue with any other commands like "grep", or with utilities built and run under Cygwin 2.6.0.
Same issue in other Windows consoles, like ConEmu or FAR Manager.
If I change Windows console encoding to UTF-8 (run: "chcp 65001"), file can be correctly displayed natively
(run: "type test.txt"), but Cygwin "cat" still has the same issue.
How should I display UTF-8 now?
No problems here - same setup.
Don't have files containing UTF-8 specials handy, but do have with Latin1 (ISO-8859-1) specials,
convertable to UTF-8.
Stripped common ASCII-only lines from output below.
Default email encoding is Unicode (hopefully UTF-8) not Western (presumably Latin1), so should render accurately.
$ uname -srvmo
CYGWIN_NT-10.0 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin
$ locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=C.UTF-8
$ egrep -a 'Deg|LF' latin1.txt # -a needed to override binary assumption - garbled characters
DegN='▒N'
DegW='▒W'
Y2LF='%s▒%s %s %s'
Y2LLF='|▒%.0s|'
LF='|▒'.YFP.'|'
$ iconv -f iso-8859-1 -t utf-8 latin1.txt | egrep 'Deg|LF' # good utf-8 characters
DegN='°N'
DegW='°W'
Y2LF='%s±%s %s %s'
Y2LLF='|±%.0s|'
LF='|±'.YFP.'|'
Sorry - this was mintty - you used cmd!
Saw similar problems you had until I set LC_ALL=C.UTF-8 (and LANG for consistency, but doesn't really matter) and chcp 65001.
Then type and Cygwin commands produce the same output.
Without CP65001 (and a Unicode console font mapping most characters - I use DejaVu Sans Mono everywhere I can) there may be no valid encoding for UTF-8 special characters in your default console CP (437 for US, 850 for non-US, others for localized versions).
Unfortunately then less displays spaces as squares, so you may have to set PAGER=more for readability.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple