Suffixes in non-western charsets

Corinna Vinschen
Mon Jan 28 12:42:00 GMT 2008


sorry for my ignorance, but I found that I have no idea how file
suffixes are handled when working in a non-western charset environment.
What I'm up to is this:

When you're using a latin-character based charset like ASCII or
ISO-8859-1, then the suffixes used for instance for executables or
shortcuts are always the same.  An executable has ".exe" or ".com", a
shortcut has ".lnk", a batch file ".bat" and so on.

How is that in non-latin charsets like, say, in cyrillic, chinese or in
japanese?  Are these suffixes in some way translated into the non-latin
charset?  If so, how?

Given that NTFS uses UTF-16, it would be possible to keep the latin
characters part of the filename.  So, if I try to find out if a path
name is a batch file, the comparison with L".bat" would still be valid.
But, is it working this way?

FAT uses the system OEM charset.  Many applications are still using
single/multi-byte functions.  So, how does it work?  Are the suffixes
fixed by using always the same byte value, regardless of the meaning of
that byte value in the used charset?  Or are they translated to
characters which have some similarity with the latin characters the
suffixes are based on?  Would the "usual" comparison work after
converting the filename to UTF-16 (as for L".bat")?

Can anybody enlighten me here?


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

More information about the Cygwin-developers mailing list