This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Shells hang during script execution
- From: Christopher Faylor <cgf-no-personal-reply-please at cygwin dot com>
- To: cygwin at cygwin dot com
- Date: Wed, 1 Mar 2006 13:21:08 -0500
- Subject: Re: Shells hang during script execution
- References: <B6C33E7A8278A0408B707C9B491720D404523B@STEELPO.steeleye.com>
- Reply-to: cygwin at cygwin dot com
On Wed, Mar 01, 2006 at 01:01:46PM -0500, Ernie Coskrey wrote:
>>>Here's a description of a second hang condition we were encountering, along
>>>with a patch for it.
>>>
>>>
>>>The application (pdksh in this case) does a read on a pipe, which eventually
>>>calls pipe.cc fhandler_pipe::read in Thread 1. This creates a new cygthread
>>>with "read_pipe()" as the function. Then >it calls th->detach(read_state).
>>>
>>>When the hang occurs, the new thread gets terminated early, before
>>>cygthread::stub() can call "callfunc()". You see the error message
>>>"erroneous thread activation". I'm not sure what's causing the thread
>>>to fail activation, but the result is, the read_state semaphore never
>>>gets signalled.
>>
>>Sorry but this is another band-aid around a problem. The real problem
>>is that the code shouldn't get into the state that you are describing.
>>That's why cygwin prints an error message - it is a serious problem.
>>Making the code deal gracefully with a problem like this isn't going
>>to solve the underlying issue.
>>
>>If you can figure out what's causing the erroneous thread activation
>>then that will be the real culprit.
>>
>>cgf
>>
>
>OK, I believe I've tracked this down.
>
>The problem occurs when we get into a read_pipe cygthread constructor
>(cygthread::cygthread()) with a NULL h and an ev that is signalled.
>When this condition exists, a hang can occur as follows:
>
>1) Creator thread calls detach(). This waits for pipe_state to be released twice
>2) read_pipe thread calls read_pipe, reads data, and releases the semaphore twice
>3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately because ev was set when the thread was created.
>4) Creator thread initiates another read_pipe cygthread to read more pipe data.
>
>At this point, there's a race: if the Creator thread gets past the
>initialization part of the constuctor, which sets __name(name), BEFORE
>the original read_pipe thread gets to the part of cygthread::stub()
>that sets info->__name = NULL, then you'll see the hang. The new
>pipe_read will give the "erroneous thread activation" message, and the
>parent will be stuck waiting for data that will never arrive.
>
>The only path that leaves an unused thread structure in a state where
>h==NULL and ev is signalled is cygthread::release(). So the fix is
>simple:
>
>$ cat cygthread.cc.udiff
>--- cygthread.cc.ORIG 2006-02-22 10:57:42.123931300 -0500
>+++ cygthread.cc 2006-03-01 12:59:23.255023000 -0500
>@@ -268,7 +268,12 @@
> cygthread::release (bool nuke_h)
> {
> if (nuke_h)
>+ {
> h = NULL;
>+
>+ if (ev)
>+ ResetEvent (ev);
>+ }
> #ifdef DEBUGGING
> __oldname = __name;
> debug_printf ("released thread '%s'", __oldname);
Nice analysis. Thank you. I think it's easier to fix this by just
making the ev event auto-reset then this condition would be caught in
terminate thread, as it was meant to be.
cgf
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/