1.5.20(0.156/4/2) pipe hangs, dos files

Darryl Miles darryl-mailinglists@netbauds.net
Wed Aug 2 17:49:00 GMT 2006


Lev Bishop wrote:
> On 8/1/06, Darryl Miles  wrote:
>> I am still interested in tackling the whole situation but I do need to
>> be furnished with a testcase to work with.  I believe the original
>> comeback by the group of users running "unison" should have insisted a
>> testcase was produced by them to demonstrate the new breakage.
> 
> As I recall, the "group of users running unison" was the exact same
> group as the group who developed the currently-commented-out code in
> select.cc, so there wasn't any particular need for them to provide
> themselves a test case....
> 
> I'm sure it's all explained in the mailing list archives. Basically,
> the NtQueryInformationFile() gives back the amount of non-paged pool
> used by the pipe, which is only the same thing as the amount of data
> available to read in the case that there are no outstanding read()s on
> the pipe. Otherwise, the commented-out code can cause a write()r to
> deadlock any time the process at the other end of the pipe issues a
> read() for more than a pipe buffer's worth of data. This is much worse
> than the current situation, where a non-blocking write can
> occasionally block, which in turn may cause (serious) performance
> issues but rarely a total deadlock. (After all, cygwin is not an rtos
> and there is allowed to have arbitrary delays at any point in the
> code, without violating the posix semantics, so long as eventually the
> write() *eventually* returns.)

Okay you seem to have some understanding as to how and why it failed for 
the "unison" group of users.  Do you think the commented out code is 
fixable in any so that all cases work correctly ?

The problem at the moment is that Corinna would like someone to explain 
how the NtQueryInformationFile() approach is broken (and me for that 
matter).

I find it difficult to understand that a Query function has a side 
effect of causing other IO work to become deadlocked.  So maybe for the 
uninitiated I'd like to hear a clear simple description of events that 
would occur from someone who understands it.

Maybe the deadlock you are reffering to a problem where the 
NtQueryInformationFile() fails to see data which is actually in the pipe 
so the deadlock comes from select() never returning correct events when 
it should.  i.e. the exact opposite of the current problem of it always 
returning writability even when it shouldn't.



If we can all get to that level on understanding you, Corinna and I then 
maybe we can all take a look at my propose approach to the problem.  By 
converting all writes (blocking and non-blocking alike) on pipes into 
overlapping IO requests and double buffering the written data.  Any 
blocking sementics we need are created in CYGWIN code by putting the 
thread to sleep.  This also means we should be able to wake up correctly 
for signals too.

Kernel buffer resource limits are imposed by a simple outstanding byte 
counter, so we start returning EAGAIN when we have more than 'ulimit -p' 
order of writes outstanding.

Checking the writability of a given FD then is a simply case of 
revalidating if the outstanding byte counter has dropped below the 
lowater buffering mark and also providing a wakeup to select() in every 
case that it does.



Again thank you for your response the main problem on the issue is that 
no many people know much about the history and technical reasons.

Darryl

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list