Re: cygwin + GetConsoleOutputCP

On 3/21/2011 3:53 AM, Andy Koppe wrote:
> I think defaulting to the console codepage makes sense for the DOS
> side of the conversion. Having said that, Windows files that aren't
> "Unicode", i.e. UTF-16, are usually encoded in the so-called ANSI
> codepage, e.g. CP1252, so it would make more sense to default to that.
> However, the real problem with this feature is that the Unix side of
> the conversion is fixed to ISO-8859-1, which makes it near-useless
> when Cygwin defaults to UTF-8. And it's no use for non-Western
> European languages in any case.

Meh...the same basic set of options/conversions is provided if unix2dos
is compiled on linux.  Only there, the "offending" function is
implemented as:

unsigned short query_con_codepage(void) {

However, each time query_con_codepage is called, it is followed by:
 if ([return value of query_con_codepage] < 2)
           pFlag->ConvMode = CONVMODE_437;

IOW, on linux, when using -iso with no specific code page, it acts just
as if you had simply specified -437 for the "dos" side; the "unix" side
is still, as always, iso-8859-1.

> A worthwhile conversion feature would use
> MultiByteToWideChar()/WideCharToMultiByte() defaulting to the system's
> ANSI codepage on the DOS side, and mbstowcs()/wcstombs() defaulting to
> the charset specified by the LC_CTYPE locale category on the Unix
> side.

Well, if you want full-featured charset conversion, then that's what
iconv(1) is for.  These added features of dos2unix/unix2dos are, in
reality, quick and dirty approaches to *single byte* charset conversion
for a *limit set* of charsets.

I'm not looking to re-implement the whole thing or modify the semantics
of the options. (Or even add a new set of options.) I'm just trying to
make sure that, given the existing semantics of the options, that
dos2unix selects the proper default CP for the "dos" side -- using
whatever is considered the definitive source for the current "dosish"
active codepage on the cygwin platform -- when the existing options are


