Cygwin Filesystem Performance degradation 1.7.5 vs 1.7.7, and methods for improving performance

Derry Shribman derry@hola.org
Wed Sep 29 17:51:00 GMT 2010


Hi,

 > Right.  Another way of looking at this is that the mount options offer
 > consistency.  The notion of setting an environment variable in Window A
 > to get one behavior and not settting it in Window B is, IMO, a support
 > nightmare and a recipe for end-user confusion.

1) for applications that internally will set this setting there is nothing 
confusing on non-consistant. The application author knows that his application 
does not use inode/nlink and sets it in his main(), and the end-user needs to 
know nothing about it, and it has no behavior change on the application (except 
for the increased performance...) since the application never uses the 
inode/nlink info. The application will behave exactly the same.

2) for users that want to set this: setting PATH can cause the same 
application/bash script to behave completely differently. same goes for 
LD_LIBRARY_PATH, SHELL, COMSPEC, TMPDIR etc. They can cause the same application 
in different shells to behave differently. This causes confusion only for 
end-users who touch things they dont understand what they do. If you dont know 
what LD_LIBRARY_PATH is: dont touch it! Unix system does not try to protect 
itself from ignorent end-users who touch things that they are not supposed to 
touch (unlike GUI applications which try to). Nothing will protect against an 
end user setting an incorrect PATH. If an end user does not know what PATH is: 
he should not touch it!

 > Or, another way of looking at this is, instead of implementing their own
 > potentially buggy, imprecise stat() they could have not thought of
 > Cygwin as a black box and either 1) offered improvements for the DLL or
 > 2) engaged the Cygwin community with requirements.  If there is ifdef'ed
 > __CYGWIN__ code in git now that means that any performance improvements that
 > we (i.e., Corinna) has made will never be noticed and that code will be
 > maintained forever.

And this is exactly what Yoni Londner is trying to do: He not only complained 
about performance: but gave a practical patch to for using setenv("CYGWIN") to 
solve the performance problems.

I am sure git developers were not happy to have to write their own version of 
stat() specially for __CYGWIN__. But it seems here that the simple to implement 
setenv("CYGWIN", "no_ino no_nlink") is being rejected without any good reason.

 > So, you're trading ifdef __CYGWIN__ in git with lots of if's in the very
 > parts of Cygwin code path where people complain about slowness.

The slowness of the cygwin filesystem calls do not come from if()'s in Cygwin's 
code.

A typical CPU today can perform around 1,000,000,000 if()'s per second (around 1 
nano second per if()).
While the 'cost' of WinNT system call is a minimum of 20,000ns, while many 
filesystem calls are much much longer.

So adding an if() to save a system call (or even 10 if()s...) - is always worth it.

Derry

On 9/29/2010 5:10 PM, Christopher Faylor wrote:
> On Wed, Sep 29, 2010 at 11:08:21AM +0200, Derry Shribman wrote:
>> Hi,
>>
>>>> Doesn't the 'noacl' mount option provide that already?
>>>
>>> Partially, there are also the ihash and the exec/notexec options.  A lot
>>> has been already discussed on the cygwin-patches list, see, for instance
>>
>> The problem with mount options is that they are 'static'. They require a cygwin
>> 'reboot' and they do not allow 'inheritance' for subprocesses, and do not allow
>> concurrent processes running in different modes.
>
> Right.  Another way of looking at this is that the mount options offer
> consistency.  The notion of setting an environment variable in Window A
> to get one behavior and not settting it in Window B is, IMO, a support
> nightmare and a recipe for end-user confusion.
>
>> Dynamic options via CYGWIN env allow setting stuff in runtime, in /etc/profile,
>> ~/bashrc, or for specific commands (and their subprocesses), such as:
>> CYGWIN=no_nlink rsync c:/... z:/...
>>
>> This allows the user to be free to decide where to relax POSIX compliance in
>> order to achieve speed.
>>
>> It also allows application developers (such as 'git'), to decide in their code
>> how they want Cygwin to behave.
>> In 'git' for example, it does need stat's nlink (number of hard links), and
>> actually, not even n_ino (the inode number). Cygwin's git performance was
>> ultra-slow, and they improved it by not using Cygwin's stat(), rather
>> re-implementing their own 'quick-stat' which worked directly with Win32 API.
>>
>> If Cygwin would have supported dynamic options (as opposed to mount time
>> options), instead of the large 'ifdef __CYGWIN__' code, it would simply be
>> adding 'setenv("CYGWIN", "no_nlink no_inode")' to the code in git's main().
>
> Or, another way of looking at this is, instead of implementing their own
> potentially buggy, imprecise stat() they could have not thought of
> Cygwin as a black box and either 1) offered improvements for the DLL or
> 2) engaged the Cygwin community with requirements.  If there is ifdef'ed
> __CYGWIN__ code in git now that means that any performance improvements that
> we (i.e., Corinna) has made will never be noticed and that code will be
> maintained forever.
>
>> This allow applications to declare they will never look into the 'st_ino' and
>> 'st_nlink'. The authors of an application, at the time of writing it, know
>> whether their code accesses these fields or not.
>
> So, you're trading ifdef __CYGWIN__ in git with lots of if's in the very
> parts of Cygwin code path where people complain about slowness.
>
> But, anyway, if we were going to implement something like this, it wouldn't
> be with environment variables, it would be with the proposed api that Eric
> Blake has mentioned in the past.
>
> cgf
>
>



More information about the Cygwin-developers mailing list