Cygwin programs doesn't support non-ASCII filenames

Lenik lenik@bodz.net
Sat May 9 15:12:00 GMT 2009


(This mail is encoded in utf-8)

On 2009-5-9 18:02, Corinna Vinschen wrote:
> [Repeated and additional question.  I accidentally sent this as PM.
>   Sorry about that.  Let's keep this on the list, please]
>
> On May  9 11:43, Lenik wrote:
>> (My system locale is zh_CN)
>
> What ANSI codepage is that?
>
> And what OEM codepage uses the console Window by default?
`chcp' shows codepage is 937
I don't know what's difference between ANSI codepage and OEM codepage.

>
>> 1, test path
>>      >>>  set LANG=&  cygpath -am .
>>      C:/Profiles/Shecti/??????
>>
>>      >>>  set LANG=zh_CN.GBK&  cygpath -am .
>>      C:/Profiles/Shecti/??????
>>
>>      >>>  set LANG=C&  cygpath -am .
>>      C:/Profiles/Shecti/×ÀÃæ
>
> Can you please give us the exact name of the directory in either
> UTF-8 or UTF-16 notation?
The two chinese characters encoding in:
GB2312: d7 c0 c3 e6
UTF-8: e6 a1 8c e9 9d a2
Unicode: \u684c \u9762

>
>> 2, the `test' utility
>>      >>>  set LANG=&  bash -c "D=$(cygpath -am .); if [ -d $D ]; then echo
>> ok $D; else echo fail $D; fi"
>>      fail C:/Profiles/Shecti/??????
>
> What you're actually testing here all the time is cygpath in the first
> place.  If you stop using cygpath, start a bash shell and use the Cygwin
> commands with the paths in POSIX notation, you would have much less
> trouble.  Cygwin is a POSIX emulation layer, after all.
>
Well, I test the pathnames using cygpath because I want to get absolute 
path so the chinese characters will be included in this test, and I 
can't type these characters in the console window. The second reason is, 
I associated .sh file type with bash, as:
   .sh=C:\lam\sys\cygwin-1.7\bin\bash -c "$(cygpath -u '%0') %*"

This is a new test don't use cygpath:
     C:\Profiles\Shecti> set LANG=& bash -c "cat 你好"
     cat: 你好: No such file or directory

     C:\Profiles\Shecti> set LANG=zh_CN.GB2312& bash -c "cat 你好"
     cat: 你好: No such file or directory

     C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "cat 你好"
     123

     C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "cat 你好"
     123

     C:\Profiles\Shecti> set LANG=& bash -c "d 你好"
     /mnt/c/Profiles/Shecti/你好 doesn't exist!

     C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "d 你好"
     /mnt/c/Profiles/Shecti/你好 doesn't exist!

     C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "d 你好"
     /mnt/c/Profiles/Shecti/你好 doesn't exist!

The same result, it shows that `cat' from binutils can support locale 
well, while `d' isn't.

> If you give me the above information I'll look into fixing cygpath.
>
>>      The GB2312 charset is a subset of GBK charset, and the characters `
>> ??????' is included in GB2312 charset. So in this example, GB2312 SHOULD
>> WORK.
>
> Sorry, no.  It's documented that GBK is supported, GB2312 isn't.  From
> what I read about GB2312 it's not actually a subset of GBK in terms
> of character definitions, it's just a subset in terms of supported
> characters.  AFAICS, GB2312 uses chars<  0x7f in multibyte sequences
> which is not feasible for Cygwin.  We could support EUC-CN, which
> seems to be another way to encode GB2312 chars, but I'm not exactly
> willing to add that now.  I'd rather stabilize what we have now and
> add further charset support in a later, official 1.7 release.
>
> So you can use LANG=zh_CN.GBK, but not LANG=zh_CN.GB2312.  It's just
> treated as invalid input.  Better: Use LANG=zh_CN.UTF-8.
>
Yes, GB2312 is a subset in terms of supported characters. Is there 
anyway to know the default locale of current cygwin installation? From 
the test I found that `unset LANG' and `set LANG=zh_CN.GB2312' just get 
the same results, so I thought that GB2312 is the default locale.

And, I'd like to use UTF-8 too, but I won't chcp to 65001, this will 
introduce a lot of new problems when deploy to customers' machines. 
while most programs and files are encoded in GB2312 in the real world.

Lenik


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list