How to make child of failed fork exit cleanly? (solved)

Wed May 4 05:33:00 GMT 2011

On 03/05/2011 2:49 PM, Ryan Johnson wrote:
> Very strangely, when every child dies (including those automatically 
> respawned by Windows), the parent also seg faults when calling 
> gcc_deregister_frame on the same dll! If even one child survives (even 
> if many had previously crashed), then no error arises. Even more 
> strangely, if I break into a first child which has a good layout (no 
> previous failures, current fork will succeed) and delay it long enough 
> that the parent times out, the parent still suffers the seg fault! 
> What shared state is there that could cause this to happen?
>
> Disabling dll finalization completely when in_forkee==1 gets rid of 
> the above problem, but occasionally I'll get a new error in the child:
>
> CloseHandle(pinfo_shared_handle<0x610031BF>) failed void 
> pinfo::release():1040, Win32 error 6
>  110356 [main] fork 10556 fork: child -1 - died waiting for longjmp 
> before initialization, retry 0, exit code 0x100, errno 11
>
> Sometimes, when the child dies as above, the parent will again seg 
> fault while deregistering a dll (but not always).
Eureka!

Turns out that the pinfo class constructor was empty, leaving its fields 
uninitialized. In particular, pinfo::destroy and pinfo::procinfo were 
highly likely to both contain non-zero garbage values. Later, a call to 
pinfo::init() is supposed to initialize both. However, as the fork error 
says, the child "died... before initialization," causing the parent to 
jump to cleanup and run pinfo::~pinfo ()... which tries to release() 
garbage.  That's why the bug doesn't arise if even one child makes it 
past this point -- pinfo::init would then be called and the destructor 
would do the right thing.

The problem would have bit folks off and on before, but my added 
fail-fast code path makes forks which were going to fail usually do so 
"before initialization."

The fix is easy, at least (pinfo.h):
-  pinfo () {}
+  pinfo () : procinfo(NULL), destroy(false) {}

At this point, the only thing left -- besides cleaning up my fork 
handling code changes to make a patch -- is to verify that it's ok to 
not run any dll finalizers in the child if the fork fails. Empirically 
it seems to do the right thing (child processes no longer fault), but I 
don't know enough about the code base to say with confidence that no 
corner cases exist.

Ryan