This is the mail archive of the
mailing list for the Cygwin project.
Re: Intermittent failures retrieving process exit codes
On Dec 21 01:30, Tom Honermann wrote:
> I spent most of the week debugging this issue. This appears to be a
> defect in Windows. I can reproduce the issue without Cygwin. I
> can't rule out other third party kernel mode software possibly
> contributing to the issue. A simple change to Cygwin works around
> the problem for me.
> I don't know which Windows releases are affected by this. I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes
> running on 64-bit Windows 7. I haven't yet tried elsewhere.
> The problem appears to be a race condition involving concurrent
> calls to TerminateProcess() and ExitThread(). The example code
> below minimally mimics the threads created and exit process/thread
> calls that are performed when running Cygwin's false.exe. The
> primary thread exits the process via TerminateProcess() ala
> pinfo::exit() in winsup/cygwin/pinfo.cc. The secondary thread exits
> itself via ExitThread() ala Cygwin's signal processing thread
> function, wait_sig(), in winsup/cygwin/sigproc.cc.
> When the race condition results in the undesirable outcome, the exit
> code for the process is set to the exit code for the secondary
> thread's call to ExitThread(). I can only speculate at this point,
> but my guess is that the TerminateProcess() code disassociates the
> calling thread from the process before other threads are stopped
> such that ExitThread(), concurrently running in another thread, may
> determine that the calling thread is the last thread of the process
> and overwrite the process exit code.
> The issue also reproduces if ExitProcess() is called in place of
> TerminateProcess(). The test case below only uses
> TerminateProcess() because that is what Cygwin does.
> Source code to reproduce the issue follows. Again, Cygwin is not
> required to reproduce the problem. For my own testing, I compiled
> the code using Microsoft's Visual Studio 2010 x86 compiler with the
> command 'cl /Fetest-exit-code.exe test-exit-code.cpp'
Wow. Thanks for this testcase. I tried to reproduce the issue and
I was not able to reprodsuce it on a single-CPU, single-core setup,
but I could reproduce it almost immediately on a dual-core system,
twice in a row in under 5 secs.
> The workaround I implemented within Cygwin was simple and sloppy. I
> added a call to Sleep(1000) immediately before the call to
> ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc. Since this
> thread (probably) doesn't exit until the process is exiting anyway,
> the call to Sleep() does not adversely affect shutdown. The thread
> just gets terminated while in the call to Sleep() instead of exiting
> before the process is terminated or getting terminated while still
> in the call to ExitThread(). A better solution might be to avoid
> the thread exiting at all (so long as it can't get terminated while
> holding critical resources), or to have the process exiting thread
> wait on it. Neither of these is ideal. Orderly shutdown of
> multi-threaded processes is really hard to do correctly on Windows.
> Since the exit code for the signal processing thread is not used,
> having the wait_sig() thread (and any other threads that could
> potentially concurrently exit with another thread) exit with a
> special status value such as STATUS_THREAD_IS_TERMINATING
> (0xC000004BL) would enable diagnosis of this issue as any process
> exit code matching this would be a likely indicator that this issue
> was encountered.
Maybe the signal thread should really not exit by itself, but just
wait until the TerminateThread is called. Chris?
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple