[PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Thomas Wolff towo@towo.net
Fri Sep 11 15:18:37 GMT 2020


Am 11.09.2020 um 17:10 schrieb Thomas Wolff:
> Am 11.09.2020 um 16:06 schrieb Corinna Vinschen:
>> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
>>> Hi Corinna,
>>>
>>> On Fri, 11 Sep 2020 14:08:40 +0200
>>> Corinna Vinschen wrote:
>>>> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
>>>>> - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
>>>>>    for the case that the multibyte char is splitted in the middle.
>>>>>    The reason is as follows.
>>>>>    * ISO-2022 is too complicated to handle correctly.
>>>>>    * Not sure what to do with ISCII.
>>>>> ---
>>>>>   winsup/cygwin/fhandler_tty.cc | 9 +++++++--
>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/winsup/cygwin/fhandler_tty.cc 
>>>>> b/winsup/cygwin/fhandler_tty.cc
>>>>> index 37d033bbe..ee5c6a90a 100644
>>>>> --- a/winsup/cygwin/fhandler_tty.cc
>>>>> +++ b/winsup/cygwin/fhandler_tty.cc
>>>>> @@ -117,6 +117,9 @@ CreateProcessW_Hooked
>>>>>     return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>>>>>   }
>>>>>   +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
>>>>> +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
>>>>> +
>>>>>   static void
>>>>>   convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>>           UINT cp_from, const char *ptr_from, size_t len_from,
>>>>> @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, 
>>>>> size_t *len_to,
>>>>>     tmp_pathbuf tp;
>>>>>     wchar_t *wbuf = tp.w_get ();
>>>>>     int wlen = 0;
>>>>> -  if (cp_from == CP_UTF7)
>>>>> -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII 
>>>>> (cp_from))
>>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>>          Therefore, just convert string without checking */
>>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>>                   wbuf, NT_MAX_PATH);
>>>>> -- 
>>>>> 2.28.0
>>>> I'd prefer to not handle them at all.  We just don't support these
>>>> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
>>>> compatible.  Let's please just drop any handling for these weird
>>>> or outdated codepages.
>>> What do you mean by "just drop any handling"?
>>>
>>> Do you mean remove following if block?
>>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII 
>>>>> (cp_from))
>>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>>          Therefore, just convert string without checking */
>>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>>                   wbuf, NT_MAX_PATH);
>>> In this case, the conversion for ISO-2022, ISCII and UTF-7 will
>>> not be done correctly.
>>>
>>> Or skip charset conversion if the codepage is EBCDIC, ISO-2022
>>> or ISCII? What should we do for UTF-7?
>> Nothing, just like for any other of these weird charsets. Cygwin never
>> supported any charset which wasn't at least ASCII compatible in the
>> 0 <= x <= 127 range.
> Actually, in Shift-JIS (CP932, supported via locale ja_JP.sjis), 0x5C 
> is ¥ :/
... or maybe not, as explained in 
https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)#Single-byte_character_differences. 
Terrible.
>>    Just ignore them and the possibility that a
>> user chooses them for fun.
>>
>>> What should happen if user or apps chage codepage to one of them?
>> Garbage output, I guess.  We shouldn't really care.
>>
>>
>> Corinna
>



More information about the Cygwin-patches mailing list