This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: Filenames with Win32 special characters (or: Interix filename compatibility)
Corinna Vinschen wrote:
> We could enhance the method to handle uppercase ASCII chars as well.
> Managed mounts could use the same method as normal mounts, just with
> upper case ASCII chars transformed, too.
>
> This would have the additional advantage that filenames on managed
> mounts not only look almost normal, the length of the real path
> also isn't changed due to the char transformation, like it is today.
Interesting. The unchanged length sounds nice, but I'm not sure I
follow about looking almost normal. Any filename with uppercase
characters would still look unintelligible in Explorer/any ANSI Win32
app, wouldn't it?
Here's an alternative idea for the encoding. What if we encode upper
case letters as themselves plus a rare combining entity? For example,
there's a block U+FE00 - U+FE0F called simply VARIATION SELECTOR-1
through VARIATION SELECTOR-16:
<http://www.fileformat.info/info/unicode/block/variation_selectors/list.htm>.
*experiments*
Well crap, those don't work very well, they display as boxes rather than
combining. But going through the entire list of combining characters, I
did find one with an interesting property: U+0331: COMBINING MACRON
BELOW. When displayed in Explorer, it looks like the normal letter with
a small underline. But the neat property of this character is that when
converted from Unicode to cp1252 it converts to the underscore, meaning
stupid ANSI programs can still edit/open/save these files. So we'd
encode uppercase ascii as simply 'A' -> "A\x0331", 'B' -> "B\x0331" and
so on. It doesn't have the property of the same length, but they still
remain intelligible in dumb apps.
(BTW, for a real hoot try creating a filename containing U+034F
COMBINING GRAPHEME JOINER.)
Brian