How to make child of failed fork exit cleanly? (solved)
Ryan Johnson
ryan.johnson@cs.utoronto.ca
Wed May 4 05:33:00 GMT 2011
On 03/05/2011 2:49 PM, Ryan Johnson wrote:
> Very strangely, when every child dies (including those automatically
> respawned by Windows), the parent also seg faults when calling
> gcc_deregister_frame on the same dll! If even one child survives (even
> if many had previously crashed), then no error arises. Even more
> strangely, if I break into a first child which has a good layout (no
> previous failures, current fork will succeed) and delay it long enough
> that the parent times out, the parent still suffers the seg fault!
> What shared state is there that could cause this to happen?
>
> Disabling dll finalization completely when in_forkee==1 gets rid of
> the above problem, but occasionally I'll get a new error in the child:
>
> CloseHandle(pinfo_shared_handle<0x610031BF>) failed void
> pinfo::release():1040, Win32 error 6
> 110356 [main] fork 10556 fork: child -1 - died waiting for longjmp
> before initialization, retry 0, exit code 0x100, errno 11
>
> Sometimes, when the child dies as above, the parent will again seg
> fault while deregistering a dll (but not always).
Eureka!
Turns out that the pinfo class constructor was empty, leaving its fields
uninitialized. In particular, pinfo::destroy and pinfo::procinfo were
highly likely to both contain non-zero garbage values. Later, a call to
pinfo::init() is supposed to initialize both. However, as the fork error
says, the child "died... before initialization," causing the parent to
jump to cleanup and run pinfo::~pinfo ()... which tries to release()
garbage. That's why the bug doesn't arise if even one child makes it
past this point -- pinfo::init would then be called and the destructor
would do the right thing.
The problem would have bit folks off and on before, but my added
fail-fast code path makes forks which were going to fail usually do so
"before initialization."
The fix is easy, at least (pinfo.h):
- pinfo () {}
+ pinfo () : procinfo(NULL), destroy(false) {}
At this point, the only thing left -- besides cleaning up my fork
handling code changes to make a patch -- is to verify that it's ok to
not run any dll finalizers in the child if the fork fails. Empirically
it seems to do the right thing (child processes no longer fault), but I
don't know enough about the code base to say with confidence that no
corner cases exist.
Ryan
More information about the Cygwin-developers
mailing list