This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cygrunsrv hangs forever on exec error (fix included)


On Nov 12 17:44, Christian Franke wrote:
> Hi,
> 
> if the exec in cygrunsrv fails or the command exits to early, cygrunsrv 
> will hang forever.
> The service can no longer be controlled until cygrunsrv has been kill(-9)ed.
> Auto restart of service does not work in this case.
> 
> 
> Steps to reproduce (at least on XP SP2):
> 
> $ cygrunsrv -I test -p /bin/true
> 
> $ cygrunsrv -S test
> cygrunsrv: Error starting a service: ... Win32 error 1053:
> ...
> 
> $ sleep 3600 #;-)
> 
> $ cygrunsrv -S test
> Service             : test
> Current State       : Start Pending
> Controls Accepted   : Stop
> Command             : /bin/true
> 
> cygrunsrv process has to be killed manually.
> 
> Same result occurs if the command's exe is removed.
> 
> 
> This is due to the following (IMO undocumented and "interesting";-) 
> behavior of the windows SCM:
> 
> The StartServiceControlDispatcher() routine does not return unless some 
> thread (no necessarily service_main()) sets SERVICE_STOPPED.
> 
> The service_main() thread started by SCM is left alone.
> Exiting service_main() does nothing, in particular it does not end 
> StartServiceControlDispatcher().
> 
> But, if SERVICE_STOPPED is set, StartServiceControlDispatcher() 
> immediately returns, again without any care about service_main().
> Because usually now the program exit()'s and kills all threads, code 
> following SERVICE_STOPPED may not be executed at all.

Thanks for this report and the simple testcases.  The description
is very helpful.  I just don't really like the idea to leave the
service_main function through _exit. 

So I rearranged the waitpid
evaluation to accomplish two results, first, except in the neverexit
case, always set the service status to SERVICE_STOPPED *and* do it as
the last action in service_main, second, simplify the code.  What bugged
me was that the WIFEXITED/WIFSIGNALED parts exists twice, even though
there isn't really a lot of difference between the return code from
the first vs. the second waitpid.

I didn't create a new cygrunsrv version for now, instead I'm sending
my diff.  I would like to hear what you think and if I didn't made a
fatal mistake, I'll uplaod a new cygrunsrv version with this changes.


Corinna

	* cygrunsrv.cc (service_main): Simplify waitpid return value
	evaluation.  Always set service status to SERVICE_STOPPED,
	except in the neverexits case.  Move the set_service_status
	call to be always the last action in service_main.

Index: cygrunsrv.cc
===================================================================
RCS file: /cvs/cygwin-apps/cygrunsrv/cygrunsrv.cc,v
retrieving revision 1.28
diff -p -u -r1.28 cygrunsrv.cc
--- cygrunsrv.cc	22 May 2005 16:34:55 -0000	1.28
+++ cygrunsrv.cc	13 Nov 2005 13:10:53 -0000
@@ -1515,81 +1515,78 @@ service_main (DWORD argc, LPSTR *argv)
       report_service_status ();
 
       int status = 1;
-      if (!waitpid (server_pid, &status, WNOHANG))
+      int ret = waitpid (server_pid, &status, WNOHANG);
+      if (!ret)
         {
 	  /* Child has probably `execv'd properly. */
 	  set_service_status (SERVICE_RUNNING);
 	  syslog(LOG_INFO, "`%s' service started", svcname);
 	  /* Keep repeating waitpid() command as long as it returned because
 	     of a handled signal sent to this cygrunsrv process */
-	  int ret;
           while ((ret = waitpid (server_pid, &status, 0)) == -1 &&
 		 errno == EINTR)
 	    ;
+	}
+      else
+        /* The ret == -1 case below is only valid for the inner watpid call. */
+        ret = 0;
 
-	  /* If ret is -1, report errno, else process the status */
-	  if (ret == -1)
+      /* If ret is -1, report errno, else process the status */
+      if (ret == -1)
+	{
+	  switch (errno)
 	    {
-	      switch (errno)
-	        {
-		case ECHILD:
-		  syslog (LOG_ERR, "service `%s' exited, & its status was lost"
-			  " errno ECHILD", svcname);
-		  break;
-		default:
-		  syslog (LOG_ERR, "service `%s' error: waitpid() failed: "
-			  "errno %d", svcname, errno);
-		}
+	    case ECHILD:
+	      syslog (LOG_ERR, "service `%s' exited, its status was lost"
+		      " errno ECHILD", svcname);
+	      break;
+	    default:
+	      syslog (LOG_ERR, "service `%s' error: waitpid() failed: "
+		      "errno %d", svcname, errno);
 	    }
-	  else if (WIFEXITED (status))
+	  service_main_exitval = errno;
+	  set_service_status (SERVICE_STOPPED);
+	}
+      else if (WIFEXITED (status))
+	{
+	  unsigned char s = WEXITSTATUS (status);
+	  if (neverexits && !shutting_down)
 	    {
-	      unsigned char s = WEXITSTATUS (status);
-	      if (neverexits && !shutting_down)
-	        {
-		  syslog (LOG_ERR, "`%s' service exited prematurely with "
-			  "exit status: %u", svcname, s);
-		  /* Do not report that the service is stopped so that if
-		     recovery options are set, Windows will automatically
-		     restart the service. */
-		  service_main_exitval = s;
-		}
-	      else
-	        {
-		  syslog (LOG_INFO, "`%s' service stopped, exit status: %u",
-			  svcname, s);
-		  set_service_status (SERVICE_STOPPED);
-		  service_main_exitval = 0;
-		}
+	      syslog (LOG_ERR, "`%s' service exited prematurely with "
+		      "exit status: %u", svcname, s);
+	      /* Do not report that the service is stopped so that if
+		 recovery options are set, Windows will automatically
+		 restart the service. */
+	      service_main_exitval = s;
 	    }
-	  else if (WIFSIGNALED (status))
+	  else
 	    {
-	      /* If the signal is the one we've send, everything's ok.
-		 Otherwise we log the signal event. */
-	      if (!termsig_sent || WTERMSIG (status) != termsig)
-		syslog (LOG_ERR, "service `%s' failed: signal %d raised",
-			svcname, WTERMSIG (status));
-	      else
-		syslog (LOG_INFO, "`%s' service stopped, signal %d received",
-			svcname, WTERMSIG (status));
-	      set_service_status (SERVICE_STOPPED);
+	      syslog (LOG_INFO, "`%s' service stopped, exit status: %u",
+		      svcname, s);
 	      service_main_exitval = 0;
+	      set_service_status (SERVICE_STOPPED);
 	    }
-	  else
-	    syslog (LOG_ERR, "`%s' service stopped for an unknown reason",
-		    svcname);
-        }
-      else if (WIFEXITED (status))
-        {
-	  /* Although we're not going to set the service status to stopped,
-	     only allow zero exit status if neverexits is not set. */
-	  if (!neverexits)
-	    service_main_exitval = WEXITSTATUS (status);
-	  syslog_starterr ("execv", 0, WEXITSTATUS (status));
 	}
       else if (WIFSIGNALED (status))
-        syslog (LOG_ERR, "starting service `%s' failed: signal %d raised",
-		svcname, WTERMSIG (status));
-      break;
+	{
+	  /* If the signal is the one we've send, everything's ok.
+	     Otherwise we log the signal event. */
+	  if (!termsig_sent || WTERMSIG (status) != termsig)
+	    syslog (LOG_ERR, "service `%s' failed: signal %d raised",
+		    svcname, WTERMSIG (status));
+	  else
+	    syslog (LOG_INFO, "`%s' service stopped, signal %d received",
+		    svcname, WTERMSIG (status));
+	  service_main_exitval = 0;
+	  set_service_status (SERVICE_STOPPED);
+	}
+      else
+	{
+	  syslog (LOG_ERR, "`%s' service stopped for an unknown reason",
+		  svcname);
+	  service_main_exitval = 0;
+	  set_service_status (SERVICE_STOPPED);
+	}
     }
 }
 

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat, Inc.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]