Fw: File name too long problem -- maybe fix coming?

Gregg Tavares unison@greggman.com
Thu Jan 10 00:56:00 GMT 2008


> From: Corinna Vinschen <corinna-cygwin@cygwin.com>
> 
> On Jan 8 10:47, Gregg Tavares wrote:
> > I did take time to look at the code in CVS which is why I posted. The code in
> CVS has PATH_MAX set to 32760. My point is that's NOT going to work. The 32760
> limit in NT is for UTF-16 paths. Those paths have to get converted to and from
> the current codepage. In any particular code page they will get BIGGER than
> 32760 characters. In order for cygwin to actually handle long filenames
> PATH_MAX has to be set to 131040 because if you get a 32760 UTF-16 path out of
> FindFileNextExW and you call WideCharToMultiByte the resulting string will be up
> to 131040 bytes long. 
> 
> Standard line breaks in mails would be really helpful, especially for
> users with textmode clients. 76 chars/line are usually fine. It also
> allows to avoid full-quotes very conveniently. Thank you.
> 

Sorry about that

> I don't think we want to go down this road. 32760 characters as
> multibyte path, in the worst case 8K characters, should be really
> enough. Keep in mind how long this path already is. PATH_MAX on Linux
> is 4096. Another problem is that a PATH_MAX which is bigger than 32K
> is totally wrong for all users using single-byte character sets.

I only brought this up because the goal I told was for cygwin to handle long paths
and I'm just pointing out setting PATH_MAX to 32k will not do that in all cases. 
Setting it to 1040 would work for my personal cases so I don't have any issue
with that I was just pointing out how 32k doesn't actually solve the long filename
problem

The problem I'm trying to solve is that if you rsync/ftp/unison or any other
program between 2 cygwin computers it would be nice if it didn't fail
to work on longnames. It currently does fail specifically because in 1.5
PATH_MAX is set to 260 and many multi-byte filenames are easily
longer than 260 bytes.

> 
> > The only other change I want
> > to make is the CP_ACP parameter which uses the current codepage, It
> > will default to CP_ACP but I want to make it user settable so it can
> > be set to CP_UTF8 which will solve the other problems I mentioned.
> 
> What is the standard ANSI codepage on typical multibyte Windows
> installations? I thought that UTF-8 is the standard ANSI CP for
> Japanese or Chinese nowadays?

The standard ANSI codepage is different per language. On Japanese 
Windows it is iso-2022-jp. This is for compatibility with older software.

AFAIK you can't actually set the window's codepage to UTF-8. All you
can do is use CP_UTF8 in calls that explicitly take a codepage like
MultiByteToWideChar for example.

> Also, when you use CYGWIN=codepage:oem,
> you can use UTF-8 by changing the console codepage, isn't it? That's
> why I didn't add extra UTF-8 handling so far.

no, unfortunately that will not set cygwin to use UTF-8 for filenames.

Again, the reason I'm hoping cygwin will support UTF-8 for filenames
is so ftp/rsync/unison etc can work across machines that have
non single-byte filenames. Currently this doesn't work at all and
the only way to get it to work is a UTF-8 option in cygwin

Thank you for the detailed reply.



More information about the Cygwin-developers mailing list