This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fw: File name too long problem -- maybe fix coming?


> From: Christopher Faylor <cgf-use-the-mailinglist-please@cygwin.com>
> 
> On Mon, Jan 07, 2008 at 10:12:22PM -0800, Gregg Tavares wrote:
> >As far as I know the largest any widechar character will be expanded to
> >is 4 bytes so that means PATH_MAX has to be set to either (260 * 4) or
> >if you're going to go for the 32767 NT limit then PATH_MAX needs to be
> >set to (32767*4) because that's the Linux Compatible public interface
> >limits that will make cygwin work regardless of the current codepage.
> >
> >That also means internally cygwin seems to need to use a different
> >constant like CYGWIN_INTERNAL_PATH_MAX if it's going to use widechars
> >internally. Internally it would use 32767 widechar size filename
> >buffers. Externally it would accept and supply 128k 8bit character
> >filename buffers.
> 
> Yes, we know. You can't set the path max any higher than it is for the
> ASCII functions which is one of the reasons why Corinna is moving all of
> the path-handling to use the unicode variety. One of the main
> motivators for doing this is to use the 32K path limit. If you take the
> time to look at what's been done in CVS you'll see that Corinna
> increased the size of the path on 2007-10-10.
> 

I did take time to look at the code in CVS which is why I posted. The code in CVS has PATH_MAX set to 32760. My point is that's NOT going to work. The 32760 limit in NT is for UTF-16 paths. Those paths have to get converted to and from the current codepage. In any particular code page they will get BIGGER than 32760 characters.  In order for cygwin to actually handle long filenames PATH_MAX has to be set to 131040 because if you get a 32760 UTF-16 path out of FindFileNextExW and you call WideCharToMultiByte the resulting string will be up to 131040 bytes long. 

Simplifying the code should be like this

// limits.h

// the public interface - You can pass in filenames of up to 131040 characters
// but they must convert to 32760 utf-16 characters or less

#define PATH_MAX 131040  // 32760*4

// typical internal cygwin code (psudo code, I know the real code calls path_conv and other stuff)

#define CYGWIN_INTERNAL_WIDECHAR_PATH_MAX 32760

int open (const char* filename, int oflag, /* mode_t mode */)
{
    WCHAR unicodePath[CYGWIN_INTERNAL_WIDECHAR_PATH_MAX];

    int result = MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, filename, -1, unicodePath, CYGWIN_INTERNAL_WIDECHAR_PATH_MAX - 1);
    if (result == ERROR_INSUFFICIENT_BUFFER)
    {
        ReportError("filename passed in did not convert to path less than 32760 unicode characters");
    }
    else
    {
         ... CreateFileW(unicodePath, ...);
    }
}

I'll be happy to try to find the time to make these changes assuming I you guys are okay with it but I don't want to go spending hours or days trying to solve this issue only to be told you won't except the changes because you are against the idea. The only other change I want to make is the CP_ACP parameter which uses the current codepage, It will default to CP_ACP but I want to make it user settable so it can be set to CP_UTF8 which will solve the other problems I mentioned. Making it settable through LC_CTYPE seems like the way to go.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]