This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Please support CP932. (I have problem using subversion with SJIS)
- From: Nayuta Taga <ganaware at gmail dot com>
- To: cygwin at cygwin dot com
- Date: Sat, 23 Jan 2010 14:49:21 +0900
- Subject: Please support CP932. (I have problem using subversion with SJIS)
Hi all,
Please support CP932. Because CP932 is not equal to SJIS, I have
problem using subversion when LANG=ja_JP.SJIS . With the attached
patch and LANG=ja_JP.CP932, I can use subversion as expected.
The problem is as follows:
I have the following line in my ~/.subversion/config:
global-ignores = *~
When LANG=ja_JP.UTF-8, subversion ignores a file 'foo~'.
But when LANG=ja_JP.SJIS, it doesn't.
I looked into subverson, then I found a workaround.
I added *[U+203E] to the line:
global-ignores = *~ *[U+203E]
([U+203E] is one character) and saved it in UTF-8. This works fine.
In short, '~' (U+007E TILDE) turns into U+203E (OVERLINE) when
LANG=ja_JP.SJIS.
Then I looked into cygwin and subversion again.
(1) cygwin1.dll converts L"foo~" (UCS-2) to "foo~" (CP932).
(2) Because subversion's internally uses UTF-8,
"foo~" (CP932) should be converted to "foo~" (UTF-8).
(3) It uses iconv to convert from *SJIS* to UTF-8,
because nl_langinfo(CODESET) returns "SJIS" when LANG=ja_JP.SJIS.
(4) The final string is "foo\xe2\x80\xbe".
(e2 80 be is UTF-8 representation of U+203E)
With my patch I can use LANG=ja_JP.CP932, nl_langinfo(CODESET) returns
"CP932". So the final string is "foo~".
supplement:
$ echo -n foo~ | iconv -f CP932 -t UTF-8 | od -t x1 -t a
0000000 66 6f 6f 7e
f o o ~
0000004
$ echo -n foo~ | iconv -f SJIS -t UTF-8 | od -t x1 -t a
0000000 66 6f 6f e2 80 be
f o o ? 80 ?
0000006
--
TAGA Nayuta <ganaware@gmail.com>
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple