This is the mail archive of the
mailing list for the Cygwin project.
Re: Locales with wrong umlauts
On Tue, 28 Mar 2006, Lapo Luchini wrote:
> Igor Peshansky wrote:
> > The system has no idea what charset it's using, because it depends on the
> > font you set for your terminal, which is outside of the terminal's
> > control. Even if you use a Unicode font with charset conversion, the
> > charset is specified outside of the console.
> Oh? I had no idea about that.
> Then the "Arial" distributed with latin1-like CP1252 areas (most western
> europe) is a different font that the "Arial" used in eastern europe
> (CP1250 AFAIR?) or the "Arilal" used for cyrillic-using places (CP1251?)?
Nope, the font is probably the same (Unicode/UCS-2), but the encoding
vector is specified in the properties of each terminal window, and thus
not set globally. That said, there may be a system-default encoding (in
the language preferences) that can be used as a good guess for the output
encoding of filenames as converted to 8-bit from UCS-2. In particular, my
Windows is set to accept Russian as one of its primary locales (the main
one being en_US), and thus my non-English filenames are rendered in the
CP1251 encoding (as is evident from xterms trying to display them using a
> Anyway, regarding file names, I don't think it is correct to say that
> the name depends on the font: the "correct" name depends on the system
> default codepage (or, well, since I guess underneath in now uses Unicode
> let's say "the codepage used for retro-compatibility in the non-unicode
> system calls").
Yep, except I would even say "the correct *rendering* of the name depends
on the default codepage". The name doesn't change if you change the
> If I have a filename with accents I want "ls" to show it "just like
> Explorer", at least by default, with no explicit override on my part
> using .Xdefaults or "rxvt -fn".
Windows terminals use the above system-default encoding. IIRC, xterm and
rxvt use latin1 by default.
> OK, maybe I prefer to use a CP850-font like LucidaP because I want to
> see line-drawings in "mc" and thus every accent will be messed up, but
> that's another matter 0=)
So, in this case, the encoding vector is part of the font. And no Windows
API call will identify this vector for you so that OUTPUT_CHARSET can be
set in the terminal...
> > Is there any way to tell mv, rm &co to display non-ASCII characters in
> > filenames? I know this isn't Cygwin-specific, but I'm not even sure what
> > to Google for.
> Ohh, us poor non-ASCII-using people, don't you know it is just plain
> wrong to use "strange accents" in filenames? Even more "wrong" starting
> a filename with a dot or (what horror) using an extension more than 3
> chars long! (just kidding ^_^)
Yes. Languages with different alphabets have a long history of
transliteration on the Internet, specifically because i18n became
widespread not too long ago (relatively speaking, of course).
> don't we blame Cygwin too much, many Windows apps has problems with
> unicode. E.g. if I create a folder name with japanese characters in it,
> most applications are not even able to save a file in it.
I'm not blaming Cygwin. If anything, I'm blaming newlib... J/K. :-)
|\ _,,,---,,_ email@example.com | firstname.lastname@example.org
ZZZzz /,`.-'`' -. ;-;;,_ Igor Peshansky, Ph.D. (name changed!)
|,4- ) )-,_. ,\ ( `'-' old name: Igor Pechtchanski
'---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow!
"Las! je suis sot... -Mais non, tu ne l'es pas, puisque tu t'en rends compte."
"But no -- you are no fool; you call yourself a fool, there's proof enough in
that!" -- Rostand, "Cyrano de Bergerac"
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html