This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

SPARSE files considered harmful - please revert


This patch is a bad idea.

2003-02-18  Vaclav Haisman  <V.Haisman@sh.cvut.cz>
	* fhandler_disk_file.cc: Include winioctl.h for DeviceIoControl.
	(fhandler_disk_file::open): Set newly created and truncated files as
	sparse on platforms that support it.

As someone on the mailing list asked, "If making every file sparse is
such a good idea, why isn't it the default?".

My experience has been that for me, sparse files take up much more
disk space than non-sparse files, and are also signicantly slower.

I build software.  My build trees have 50000 files, average size 8k.
When I copied build trees to a Win2000 NTFS disk using Cygwin tools
(either cp or tar or rsync) the actual space used on the disk (as
reported by df, not du) quintupled.

Here's what I think is happening.  Sparse files are implemented like
compressed files, using 16 clusters.  See this web page:

http://www.storageadmin.com/Articles/Index.cfm?ArticleID=15900&pg=1&show=654

As a result, a non-empty but small sparse file takes up a minimum of
16*clustersize bytes on the disk.  My measurements suggest an overhead
of 32kb per file with a cluster size of 4kb.

Here are some experiments to support my results:
MKS's commands creates files 5 times smaller than Cygwin commands.

----------------------------------------------------------------
In 1.3.22:
cpdir is a trivial script that does basically 
(cd $dir1; tar cf - .) | (cd $dir2; tar xf -)
`cp -pr' works the same way.

# Use Cygwin commands to create a huge file tree
#
$ df .; cpdir dev2 copy-of-dev2; df .
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      6001      5491  53% /d
==> mkdir -p copy-of-dev2
cpdir dev2 copy-of-dev2  17.46s user 53.72s system 18% cpu 6:33.99 total
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      8438      3054  74% /d
$ du -sm dev2 copy-of-dev2
419	dev2
419	copy-of-dev2
du -h -sm dev2 copy-of-dev2  5.64s user 16.36s system 76% cpu 28.784 total


----------------------------------------------------------------

After reverting to 1.3.20, or patching latest CVS:
I used this method to reclaim disk space that was eaten up by the
SPARSE file disk hog.

$ df .; mv ws ws-old; cpdir ws-old ws; df .
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      6910      4582  61% /d
==> mkdir -p ws
cpdir ws-old ws  58.68s user 225.50s system 19% cpu 23:44.30 total
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      9085      2407  80% /d
$ df .; rm -rf ws-old; df .
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      9085      2407  80% /d
rm -rf ws-old  21.86s user 71.33s system 38% cpu 4:01.85 total
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
d:          system       11492      3689      7803  33% /d


----------------------------------------------------------------

I'm sure if you do the experiments yourself, you will see this for
yourself.  To reproduce this problem, you need NTFS 5.0 on Windows
2000.  Sparse files are a recent NTFS feature.

The patch is obvious, but I'll send it to cygwin-patches anyways.

Without this patch, Cygwin is unusable for me.

Martin

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]