[ANNOUNCEMENT] [1.7] Updated: coreutils-7.0-1

Corinna Vinschen corinna-cygwin@cygwin.com
Tue Dec 16 14:11:00 GMT 2008

On Dec 16 06:25, Eric Blake wrote:
> According to Corinna Vinschen on 12/16/2008 2:20 AM:
> >> unfortunately, is that the Linux patch to use d_type and inode
> >> sorting to speed up rm from quadratic to linear on directories with a
> >> large number of files did not apply to cygwin because of differences in
> >> statfs.
> > 
> > -v?  Is that something we can support by tweaking Cygwin?
> I'm not sure yet.  It doesn't even work on Hurd, and part of the bug is
> coreutils' fault:
> http://lists.gnu.org/archive/html/bug-coreutils/2008-10/msg00005.html
> The problem is that Linux has hardcoded magic constants for various
> filesystem types, returned through struct statfs.f_type, which are

Hmm, Cygwin's statvfs struct doesn't have f_type.

> distinct from magic constants returned by other OSs. 
> [...]
> Also, coreutils currently only sorts large directories, but cygwin reports
> directory st_size as 0 regardless of directory size, so there is no way to
> identify large directories up front.

Not quite.  Did you call `ls -s' on cygwin's / directory lately?  A snippet
from mine on one of my machines look like this:

160 drwxrwx---+ 1 corinna vinschen       163840 Dec 16 10:13 bin
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 15  2008 cygdrive
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 30  2008 dev
 12 drwxrwx---+ 1 corinna vinschen        12288 Dec 15 11:15 etc
  4 drwxr-xr-x+ 1 corinna vinschen         4096 Jul  4 10:41 home
 40 drwxrwx---+ 1 corinna vinschen        40960 Dec  8 11:58 lib
  0 dr-xr-xr-x  8 corinna vinschen            0 Dec  1  2006 proc
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 15  2008 sbin
  4 drwxrwxrwt+ 1 corinna vinschen         4096 Dec 15 16:35 tmp
  4 drwxrwx---+ 1 corinna vinschen         4096 Dec  8 11:54 usr
  0 drwxr-xr-x+ 1 SYSTEM  Administrators      0 May 21  2008 var

The size of a directory which you just created is 0.  But big
directories (like /bin), or directories which once were big (like /tmp)
have a size which is a multiple of 4K.  This size is what's returned by
the NT function NtQueryInformationFile.  I assume that a directory is
created with one block in a pre-allocated area in the MFT or so, which
explains size 0.  When the dir grows, then normal FS blocks are added,
so the size grows beyond 0.  But actualyy I have no idea, so it could be
entirely different. :)

>  183 /* Return the type of the specified file system.
>  184    Some systems have statfvs.f_basetype[FSTYPSZ] (AIX, HP-UX, and
> Solaris).
>  185    Others have statvfs.f_fstypename[_VFS_NAMELEN] (NetBSD 3.0).
>  186    Others have statfs.f_fstypename[MFSNAMELEN] (NetBSD 1.5.2).
>  187    Still others have neither and have to get by with f_type (Linux).
>  188    But f_type may only exist in statfs (Cygwin).  */

Yeah, but we don't have that.  For type recognition we have
statvfs::f_flag which is an exact copy of the Windows FS flags, or
mntent::mnt_type, which is the file system name (like "ntfs").  So the
ability would be available, it just had to be used.

> [...]
>   And even if the
> coreutils files are improved, we are back to the bigger original question:
> Are there any file systems accessed by cygwin where sorting readdir()
> results into inode order, rather than visiting contents in directory
> listing or name order, provides a speedup by allowing less disk seek time
> (or put another way, do the inode numbers presented by Cygwin for local
> NTFS disks match disk seek order)?  Conversely, are there any file systems
> where taking the time to sort readdir() results is provably a waste (for
> example, a ramdisk, where seek time is instant regardless of inode, or FAT
> and NFS where inode numbers are synthesized with no correlation to disk
> layout,

Interesting question.  NTFS and FAT filesystems are name-sorted by
default.  AFAIK directory changes on FAT are done in-memory, resorted,
and then written back as a whole block to disk.  NTFS is using an
always name-sorted B+ tree anyway.  So, as far as I can tell, resorting
by inode number would probably not help to speed up rm.  But that's
just me.


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

More information about the Cygwin mailing list