Cygwin Filesystem Performance degradation 1.7.5 vs 1.7.7, and methods for improving performance

Yoni Londner yonihola2@gmail.com
Wed Sep 22 05:46:00 GMT 2010


Hi,

 > There's also the problem of handling NFS shares.  However, I just had an
 > idea how to speed up symlink_info::check without neglecting NFS shares.
 > This will take some time, though since it turns a lot of code upside
 > down.  Stay tuned.

This sounds great! Cygwin filesystem performance is a very important 
issue, and any improvement is more than welcome!

 > I don't understand how you think this should work.  The filter expression
 > given to NtQueryDirectoryFile is either a constant string and has to 
match
 > the filename exactly, or it contains wildcards.  This is documented
 > behaviour: 
http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx
 > So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk"
 > does not.

There are two options for stat() and other places the need file info 
(such as check_symlink):

1) CreateFile(the_dir), then NtQueryDirectoryFile("foo*") and retrieve 
all the info (including the hardlink), filter out the results in 
user-mode ("foo", "foo.exe", "foo.lnk"), and then call CloseHandle().

2) CreateFile(the_dir), NtQueryDirectoryFile("foo"), 
NtQueryDirectoryFile("foo.exe"), NtQueryDirectoryFile("foo.lnk"), 
CloseHandle(). The calls to NtQueryDirectoryFile() should be with 
RestartScan=1, so that the the_dir handle can be reused. Also 
ReturnSingleEntry=1 can be set to improve performance.

This is instead what is done today in cygwin:
3) CreateFile("foo"), NtQueryFileInformation(), CloseHandle() (and 
repeat this for "foo.exe" and "foo.lnk")

I did some performance tests comparing #1 #2 and #3.

I found out that #1 and #2 are both around 10x to 100x (!!!) times 
faster than #3.

I checked out why, and found out that #1 and #2 don't modify the access 
time of the file, whereas #3 does. This already immediately causes a 
huge performance penalty (and it is also not according to the posix 
standard: stat("foo") should not update atime of "foo").
Another reason is that the kernel NTFS driver performs automatically 
read-ahead of the file, thus just stat("foo") (which calls 
CreateFile("foo") in #3) causes the first 64k of "foo" to be read from 
the disk - slowing down performance tremendously. Think of "ls /bin" 
with 3500 files: NTFS reads the first 64K of all the 3500 files! no 
wonder it takes so long...
And yet another reason why #3 is way slower than #1 and #2 is the 
anti-viruses: Nearly all Windows users install an AV (or use Win7 MS 
AV). These trap and monitor all CreateFile() to regular files (not to 
directory files). Therefore CreateFile() to a regular file can take a 
lot lot longer than CreateFile() to a directory.

I would suggest using #2 over #1, since its simpler code-wise, and I did 
not see any serious performance difference between the two.

Yoni


On 14/9/2010 12:05 PM, Corinna Vinschen wrote:
> On Sep 13 13:28, Yoni Londner wrote:
>> Hi,
>>
>>> However, isn't that kind of a chicken/egg situation?  If you want to
>>> reuse the content of the FILE_BOTH{_ID}_DIRECTORY_INFORMATION structure
>>> from a previous call to readdir, you would have to call the
>>
>> I am not talking about reusing info from a previous readdir.
>>
>> Every single file cygwin tries to access, it does it in a loop,
>> trying afterwards to check for *.lnk file.
>>
>> Using the directory query operations, it is possible to get this
>> info faster:
>> instead of getting file info for FOO and then for "FOO.lnk",
>> Cygwin can query the directory info for "FOO FOO.LNK" (for the file
>> requested, plus its possible symlink file).
>
> I don't understand how you think this should work.  The filter expression
> given to NtQueryDirectoryFile is either a constant string and has to match
> the filename exactly, or it contains wildcards.  This is documented
> behaviour: http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx
> So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk"
> does not.
>
> There's also the problem of handling NFS shares.  However, I just had an
> idea how to speed up symlink_info::check without neglecting NFS shares.
> This will take some time, though since it turns a lot of code upside
> down.  Stay tuned.
>
>
> Corinna
>



More information about the Cygwin-patches mailing list