This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Shells hang during script execution


There are two hang conditions that we've identified and have developed fixes for.  This is a description of the first of the two along with a patch; I'll follow up with a description and patch for the second.


If a signal can't be handled because it is blocked, it gets queued (on 
the process's "sigq") to be handled later. Now, whenever the process's 
signal mask changes (e.g., the signal in question gets unblocked), an 
attempt is made to handle all the queued signals (i.e., a signal flush 
occurs). However, if the queueing of the blocked signal happens right 
after the signal mask change, then we miss the signal. This causes the 
process to hang. The signal is on the queue, but the process doesn't 
know to check for it. The process just hangs until another signal gets 
sent to it.

The workaround is basically to force the signal queue to be rescanned 
(flushed) whenever we add something to it, so a queued signal is never 
missed.


--- sigproc.cc.ORIG	2006-02-16 14:02:42.814320000 -0500
+++ sigproc.cc	2006-02-22 10:55:20.327209900 -0500
@@ -1130,6 +1130,7 @@
 	case __SIGNOHOLD:
 	case __SIGFLUSH:
 	case __SIGFLUSHFAST:
+flush:
 	  sigq.reset ();
 	  while ((q = sigq.next ()))
 	    {
@@ -1150,6 +1151,8 @@
 	  else
 	    {
 	      int sig = pack.si.si_signo;
+	      if (sig == SIGCHLD)
+		clearwait = true;
 	      // FIXME: REALLY not right when taking threads into consideration.
 	      // We need a per-thread queue since each thread can have its own
 	      // list of blocked signals.  CGF 2005-08-24
@@ -1165,10 +1168,11 @@
 			system_printf ("Failed to arm signal %d from pid %d", pack.sig, pack.pid);
 #endif
 		      sigq.add (pack);	// FIXME: Shouldn't add this in !sh condition
+		      goto flush; // signal may have become unblocked while
+		                  // we were processing it (before we added
+			          // it to the sigq) -- flush sigq to be sure	
 		    }
 		}
-	      if (sig == SIGCHLD)
-		clearwait = true;
 	    }
 	  break;
 	}

> -----Original Message-----
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin@cygwin.com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD <pid>", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -----Original Message-----
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin@cygwin.com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> ----<top.sh>----
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
>         x=`./subtest.sh $dir`
>         date
>         echo loop $loop
> done
> 
> ----<subtest.sh>----
> for j in `ls $1`
> do
>         if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
>         then
>                 echo $j
>         fi
> done
> echo subtest1 done >&2
> 
> --------
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> -----
> Ernie Coskrey       SteelEye Technology, Inc.    803-461-3875
> 
> 

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]