How to make child of failed fork exit cleanly?

Ryan Johnson ryan.johnson@cs.utoronto.ca
Tue May 3 18:49:00 GMT 2011


On 03/05/2011 11:46 AM, Ryan Johnson wrote:
> 2. When the child does exit, how to prevent finalizers from running 
> for dlls which did not load properly?
> Context for the second question: exiting the child tends to trigger 
> access violations, often in a pthread_mutex destructor call (la-la 
> land). Some of these can be avoided by disabling stack dumping from 
> api_fatal (see separate email about alloca and stack walking), but the 
> others continue to mystify.
>
> Overal, AFAICT, the cygwin dll design assumes that all dlls have 
> loaded properly, and a failed fork breaks that invariant. I worry that 
> some "properly-loaded" dll accesses state of a "not-properly loaded" 
> dependency
The plot thickens... single-stepping through dll finalization, the crash 
occurs because of a call to __gcc_deregister_frame, which is inserted 
automatically by gcc (to deal with C++ exception handling unwind info?). 
Single-stepping into the call is a descent into chaos, with the end 
result that the process exits from a kernel32.dll call with an error 
code that suggests an access violation occurred (0x000005a).

The cygwin dll in question is statically-linked, loaded at the desired 
address, and depends only on cygwin1.dll, cyggcc_s-1.dll, and 
cygstdc++-6.dll (all of which are still loaded, their finalizers did not 
run yet). It had just executed its own global destructors. No global 
initializers had run, because in_forkee was set.

Very strangely, when every child dies (including those automatically 
respawned by Windows), the parent also seg faults when calling 
gcc_deregister_frame on the same dll! If even one child survives (even 
if many had previously crashed), then no error arises. Even more 
strangely, if I break into a first child which has a good layout (no 
previous failures, current fork will succeed) and delay it long enough 
that the parent times out, the parent still suffers the seg fault! What 
shared state is there that could cause this to happen?

Disabling dll finalization completely when in_forkee==1 gets rid of the 
above problem, but occasionally I'll get a new error in the child:

CloseHandle(pinfo_shared_handle<0x610031BF>) failed void 
pinfo::release():1040, Win32 error 6
  110356 [main] fork 10556 fork: child -1 - died waiting for longjmp 
before initialization, retry 0, exit code 0x100, errno 11

Sometimes, when the child dies as above, the parent will again seg fault 
while deregistering a dll (but not always).

At this point I'm thoroughly confused. Does anyone have some 
enlightenment to offer?

Gory details below...
Ryan

Single-instruction stepping yields the following stack trace (sort of -- 
it doesn't reflect any one stack trace reported by gdb, because the 
stack kept changing). Stack frames marked with '*' are those which I 
suspect are due to a jump into la-la land; those marked with '+' 
correspond to a longjmp call which unwound the stack back to _sigfe an 
unknown number of times (at least twice).

*0x75a81136 in KERNEL32!GetPrivateProfileStructA () from 
/cygdrive/c/Windows/syswow64/kernel32.dll
*0x6115e228 in WaitForSingleObject@8 () from /usr/bin/cygwin1.dll
*0x610d63e5 in muto::acquire (this=0x611700c0, ms=4294967295) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/sync.cc:91
*0x61077dbf in calloc (nmemb=1, size=44) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/malloc_wrapper.cc:106
*0x61003129 in operator new (s=44) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/cxx.cc:23
*0x610ecece in pthread_mutex::init (mutex=0x67f0900c, attr=0x0, 
initializer=0x14) at /home/Ryan/apps/cygwin-src/winsup/cygwin/thread.cc:2746
+0x610c68b5 in __sjfault () from /usr/bin/cygwin1.dll
+0x610eeb63 in pthread_mutex_lock (mutex=0x67f0900c) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/cygtls.h:279
*0x610c6675 in _sigfe () from /usr/bin/cygwin1.dll
*0x610eeb00 in pthread_spinlock::init () at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/thread.cc:2869
*0x610c7dc7 in _sigfe_pthread_mutex_lock () from /usr/bin/cygwin1.dll
*0x67f08a40 in cyggcc_s-1!__gthread_mutex_unlock () from 
/usr/bin/cyggcc_s-1.dll
0x67f054ad in cyggcc_s-1!__deregister_frame_info_bases () from 
/usr/bin/cyggcc_s-1.dll
0x660010d9 in __gcc_deregister_frame () from 
/cygdrive/c/cygwin/home/Ryan/experiments/fork-tests/cygbar.dll
0x61021d1e in per_module::run_dtors (this=0x61251050) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.cc:89
0x61161716 in dll::run_dtors (this=0x61251048) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.h:68
0x61022b36 in dll_list::detach (this=0x611e3440, retaddr=0x6600124d) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.cc:343
#3  0x61022bea in cygwin_detach_dll () at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.cc:954
#4  0x610c6665 in _sigfe () from /usr/bin/cygwin1.dll

Very oddly, the parent process segfaults as well, in the same location 
as the child, when it tries to exit. This only occurs when the child 
crashes enough that windows fails to restart it. If the child crashes 
once, but the next child succeeds, the parent does not fault:
#0  0x67f054bc in cyggcc_s-1!__deregister_frame_info_bases () from 
/usr/bin/cyggcc_s-1.dll
#1  0x660010d9 in __gcc_deregister_frame () from 
/cygdrive/c/cygwin/home/Ryan/experiments/fork-tests/cygbar.dll
#2  0x61021d1e in per_module::run_dtors (this=0x61251050) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.cc:89
#3  0x61161766 in dll::run_dtors (this=0x61251048) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.h:68
#4  0x61021d70 in dll_global_dtors () at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dll_init.cc:61
#5  0x611492b7 in __call_exitprocs (code=0, d=0x0) at 
../../../.././newlib/libc/stdlib/__call_atexit.c:116
#6  0x6112152a in exit (code=0) at 
../../../.././newlib/libc/stdlib/exit.c:61
#7  0x61005fcb in cygwin_exit (n=0) at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dcrt0.cc:1111
#8  0x610081c0 in _cygwin_exit_return () at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dcrt0.cc:928
#9  0x61005b36 in _cygtls::call2 (this=0x28ce64, func=0x61007a50 
<dll_crt0_1(void*)>, arg=0x0, buf=0x28cda4)
     at /home/Ryan/apps/cygwin-src/winsup/cygwin/cygtls.cc:69
#10 0x61005bdb in _cygtls::call (func=0x61007a50 <dll_crt0_1(void*)>, 
arg=0x0) at /home/Ryan/apps/cygwin-src/winsup/cygwin/cygtls.cc:62
#11 0x610079bf in _dll_crt0@0 () at 
/home/Ryan/apps/cygwin-src/winsup/cygwin/dcrt0.cc:948
#12 0x004013c2 in cygwin_crt0 ()
#13 0x00401015 in mainCRTStartup ()




More information about the Cygwin-developers mailing list