untarring symlinks with ../ fails randomly, silghtly OT

Wolf Geldmacher wolf.geldmacher@abacus.ch
Tue Jul 5 15:53:00 GMT 2011

On Tue, 2011-07-05 at 14:10 +0200, Corinna Vinschen wrote:
> On Jul  4 12:46, Corinna Vinschen wrote:
> > On Jul  4 11:15, Wolf Geldmacher wrote:
> > > As an aside:
> > > 	I also used to have some trouble with "rm -rf" of a directory
> > > 	hierarchy failing more or less reproducibly (like: 80% of the
> > > 	time) because files were presumably still "in use". Repeating
> > > 	the command several times would succeed, though.
> > > 
> > > 	Downgrading from cygwin1.dll/ to cygwin1.dll/
> > > 	seems to have solved that issue as well - still have to see
> > > 	the first "retry to delete".
> > > 
> > > This may or may not be related to the original report, as it also reeks
> > > of a race condition during file/directory operations.
> > 
> > I can neither reproduce the tar problem, nor can I reprocude the rm
> > problem.  I tried this under 2008R2 which is basically the same as your
> > W7-64 bit.  I used local and remote drives to test the issue but to no
> > avail.
> Finally I managed to reproduce the problem and now I see what happens.
> Windows does not write back the file change timestamp unless the file
> buffers are flushed.  This usually occurs at close time.  In contrast to
> POSIX specifications the timestamps are *not* automatically updated when
> a call to fetch file metadata is performed.
> Here's what tar does when creating the symlink:
>   1. create file with 000 permissions
>   2. fstat
>   3. close file
>   [...]
>   4. stat file
>   5. if fstat.st_ctime != stat.st_ctime ==> symlink placeholder has been
>      overwritten.
> The problem is that the call to fstat on the opened handle gets some
> value of the change time timestamp, but the subsequent close changes
> the timestamp again.
> Speculation: It seems that the timestamp fstat sees is the timestamp
> created at the time NtCreateFile is called, while the timestamp from the
> call to NtSetSecurityFile to change the DACL is cached and only updated
> when calling NtClose.
> This also explains why this doesn't occur in 1.7.8.  In 1.7.8, the DACL
> has been written using another file handle, because the original handle
> didn't have the right to change the DACL.  By adding the WRITE_DAC flag,
> I allowed Cygwin to use the original file handle to write the DACL.  The
> difference is:
> 1.7.8:
>   - create file
>   -   open file for writing the DACL
>   -   write DACL
>   -   close
>   - do whatever the orignal handle was opened for
>   - close
> 1.7.9:
>   - create file
>   - write DACL
>   - do whatever the orignal handle was opened for
>   - close
> So, with 1.7.9 the close call after writing the DACL is missing, which
> accounts for the missing flushing of the file metadata.
> By calling FlushFileBuffers in fstat before calling NtQueryInformationFile
> I can fix the problem.  Unfortunately that slows down applications like tar,
> which use fstat a lot, a lot.
> There are two solutions, one is reverting to the 1.7.8 state, which
> means, writing the DACL requires to open the file again, or calling
> FlushFileBuffers in fstat.
> I compared both solutions.  On my hardware, calling tar xzf on your file
> is 500% slower if fstat calls FlushFileBuffers compared to just dropping
> the WRITE_DAC flag from the open call.  Wow!  Imagine that I added the
> WRITE_DAC flag to gain performance...
> So I guess this all boils down to the fact that adding WRITE_DAC was
> not really a good move.  It's a shame that Windows punishes every try
> to speed up file operations with a raise in non-POSIXy behaviour :-(((
> I changed that in CVS and right now I'm generating a new developer
> snapshot on http://cygwn.com/snapshots/.  Give it a try, please.
> Thanks,
> Corinna
I downloaded and installed the daily dll: I can no longer reproduce the
"failing symlink" problem at all which was 100% reproducible before. So
it looks like your diagnosis and the fix are correct. Thank you very
much for your support!

Regarding the "rm -rf failing" problem: Although I could no longer
reproduce the issue on the test machine when I downgraded to the older
dll, it *did* happen yesterday night on the nightly build with a 1.7.8
cygwin1.dll - so it seems to be unrelated to the WRITE_DAC change, which
incidentially also agrees with Ryan's test results.

Thanks again & Regards

