Cygwin fails to utilize Unicode replacement character

Thomas Wolff towo@towo.net
Sat Sep 1 21:07:00 GMT 2018


Am 01.09.2018 um 20:46 schrieb Steven Penny:
> On Sat, 1 Sep 2018 20:11:15, Thomas Wolff wrote:
>> Which terminals are used and what's the output of `locale` and `cat 
>> --version` in both cases?
>
> ...
>
> Note that in addition to Linux, Windows PowerShell also gives correct 
> output:
>
>    $ pwsh -c '[system.text.encoding]::UTF8.getString(0xEB)'
>    �
What makes you claim this would be the "correct output"? Where is this 
defined?

> compare again with Cygwin:
>
>    $ printf '\xEB'
>    ▒
Actually, in mintty, this is not (anymore) the MEDIUM SHADE. Please compare.
There's also a problem with using MEDIUM SHADE. In an ambiguous-width 
locale (or explicit ambiguous-width terminal mode), that character has 
double-width and is therefore not suitable as a replacement for a single 
illegal UTF-8 byte.
Cygwin console does not support double-width so it does not have this 
problem, but until further clarification I think I'll not change it in 
mintty.
Thomas

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list