[PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().
Takashi Yano
takashi.yano@nifty.ne.jp
Fri Sep 11 12:35:15 GMT 2020
Hi Corinna,
On Fri, 11 Sep 2020 14:08:40 +0200
Corinna Vinschen wrote:
> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > for the case that the multibyte char is splitted in the middle.
> > The reason is as follows.
> > * ISO-2022 is too complicated to handle correctly.
> > * Not sure what to do with ISCII.
> > ---
> > winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > index 37d033bbe..ee5c6a90a 100644
> > --- a/winsup/cygwin/fhandler_tty.cc
> > +++ b/winsup/cygwin/fhandler_tty.cc
> > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > }
> >
> > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > +
> > static void
> > convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > UINT cp_from, const char *ptr_from, size_t len_from,
> > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > tmp_pathbuf tp;
> > wchar_t *wbuf = tp.w_get ();
> > int wlen = 0;
> > - if (cp_from == CP_UTF7)
> > - /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > + if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > + /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > + - ISO-2022 is too complicated to handle correctly.
> > + - FIXME: Not sure what to do for ISCII.
> > Therefore, just convert string without checking */
> > wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > wbuf, NT_MAX_PATH);
> > --
> > 2.28.0
>
> I'd prefer to not handle them at all. We just don't support these
> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> compatible. Let's please just drop any handling for these weird
> or outdated codepages.
What do you mean by "just drop any handling"?
Do you mean remove following if block?
> > + if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > + /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > + - ISO-2022 is too complicated to handle correctly.
> > + - FIXME: Not sure what to do for ISCII.
> > Therefore, just convert string without checking */
> > wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > wbuf, NT_MAX_PATH);
In this case, the conversion for ISO-2022, ISCII and UTF-7 will
not be done correctly.
Or skip charset conversion if the codepage is EBCDIC, ISO-2022
or ISCII? What should we do for UTF-7?
What should happen if user or apps chage codepage to one of them?
--
Takashi Yano <takashi.yano@nifty.ne.jp>
More information about the Cygwin-patches
mailing list