This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin fails to utilize Unicode replacement character


On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote:
No idea what you consider dangerous. Anyway, we obviously agree that hardly any available console font supports the REPLACEMENT CHARACTER. You had previously suggested code that might work (using CreateFont(0, 0, ....)). Maybe you can sort out with Corinna how to get that work inside cygwin. Otherwise, my opinion:
- *working* fallback from FFFD to 2592: good

i am fine with this, but i think corinna feels it is too much code for not
enough benefit - thats her decision.

- fix FFFD: not good, because the .notdef glyph is not an appropriate indication of illegal encoding (like broken UTF-8 bytes)

not sure what you even mean by this - FFFD doesnt need fixing - Windows just
need to adopt some fonts with proper unicode support. we are dealing with their
lack of doing that.

the .notdef glyph is not an appropriate indication of illegal encoding (like
broken UTF-8 bytes)

true, but neither is U+2592. as far as i know U+2592 is not defined officially
anywhere as being a representation of anything other than "MEDIUM SHADE".
Corinna originally added it in 2009:

http://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git&a=commitdiff&h=161211d

with no justification of why it was chosen that i can tell. similarly, mintty
actually changed from U+FFFD to U+2592 in 2009:

http://github.com/mintty/mintty/commit/90c11d3

with actually a good reason, which was to avoid ambiguity with fonts that didnt
have U+FFFD. but again, no reason why U+2592 was chosen. i personally see both
sides of the argument but i tend to land of the side of any standards if they
exist. Here is the standard for U+FFFD:

http://unicode.org/charts/nameslist/n_FFF0.html

- revert to 2592: OK

if we were to use something other than U+FFFD, I would propose U+25A1, as it is
also defined by Unicode:

   25A1	 □ 	White Square
   •	may be used to represent a missing ideograph

http://unicode.org/charts/nameslist/n_25A0.html

and it has better support than U+FFFD:

   yes:
   - Consolas
   - Courier New
   - DejaVu Sans Mono
   - MS Gothic
   - NSimSun

   no:
   - Lucida Console
   - SimSun-ExtB


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]