1.7.7: rm -rf sometimes fails - race condition?

Steven Hartland killing@multiplay.co.uk
Fri Dec 10 22:30:00 GMT 2010


----- Original Message ----- 
From: "Christopher Faylor"

>>This looks like either a premature return from a syscall or libcall, or like a
>>genuine race in the system.
>>
>>Has anyone seen similar things?
> 
> Yes and you seem to have nailed the problem - it happens when a virus checker
> hooks into a syscall and allows it to return before completion.  I don't think
> we want to modify Cygwin to not trust success return values from system calls.

Is this the age old delete on close raising its ugly head again?

So the rm kicks in a file is shared locked, rm uses the cygwin unlink code
which "schedules" the file for deletion and returns success without actually
succeeding, hence when it comes to delete the parent dir it fails as the file
actually still exists.

Finally figured this is the cause of unlink in perl returning success when
the file still existed, I was like WTF!! A shared resource file locked by
another process in our case, and this behaviour lead to many hours of head
scratching and large amounts of workaround code.

Personally I think the only solution is to remove this delete on close code
and fail hard for shared locked files, as it gives a much more predictable
code flow. Having unlink return success but the file not being deleted before
return is confusing as hell :(

    Regards
    Steve


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list