How to make child of failed fork exit cleanly?

Tue May 3 23:03:00 GMT 2011

On 03/05/2011 2:41 PM, Corinna Vinschen wrote:
> On May  3 11:46, Ryan Johnson wrote:
>> Hi all,
>>
>> I'm working on some changes to fork() which would detect early the
>> case where a parent-child pair have unresolvable differences in
>> address space layout (e.g. thread stacks, heaps, or
>> statically-linked dlls which moved).
>>
>> Detecting the problem turned out to be pretty easy, but making the
>> child exit cleanly is not. This leads to two questions, followed by
>> what I have figured out so far while attempting to answer them
>> myself.
>>
>> 1. What's the best way to make a child process notify the parent
>> that the fork() cannot succeed, and exit cleanly?
> Usually by using some helpful status code which then can be recognized
> by child_info::proc_retry.
Wonderful. I copied the example of heap.cc in calling 
fork_info->handle_failure(), and it works beautifully... except when the 
child seg faults.

>> Given that the cause of the fork failure is known (rather than some
>> surprise or bug), I propose that the messages go to some strace
>> channel (a new one for fork, perhaps?) and that the child exit
>> without attempting to generate a dump file (especially since dump
> Sounds ok to me, if you're really sure that the situation is not
> recoverable.
I wish it was recoverable, because it's a huge pain for applications 
which pull in many statically-linked dlls and then fork (emacs, gcc, 
python, ...). Unfortunately, I know of no way to force-unload a dll 
brought in by the nt loader. If one of those dlls lands in the wrong 
place, we're stuck even if we figure out how to get rid of whatever 
heap/stack/file-map was in the way.

>> generation itself has a tendency to cause crashes). It would also be
>> good, in cases where the parent is the reason for fork failures, to
>> prevent Windows from respawning the process so many times (though it
>> is admittedly handy when the child was the problem and the fork
>> succeeds on the nth try).
> See above.  That's handled in child_info::proc_retry.
I'll keep that in mind, but for now I'm leaving it alone. Technically 
it's always possible the fork could succeed, and I don't know how 
effectively I could identify a bad parent in the general case (other 
than seeing that fork fails repeatedly).

>> All of this still leaves the question of
>> how to exit the child process, "properly" though. Is it necessary to
>> wait for dll initialization to finish first, for example?
> I'm not sure I understand the question.  How do you know which
> DLL is already initialized and which isn't?
I'm talking about a call to dll_list::alloc, due to a DLL_LINK which did 
not map to its parent's address. At this point we know the fork has 
failed and there's no point continuing to try.

When this happens there are several possibilities: {windows, cygwin dll} 
x {DLL_LINK, DLL_LOAD} x {match or not match parent base addr}.
- We know no DLL_LOAD has been mapped, let alone initialized, but AFAICT 
that currently doesn't stop cygwin dll finalizers (copied over from the 
parent) from running. Maybe I missed something here because I'd expect 
this to cause far more trouble than it seems to in practice (I'm still 
testing the statically-linked version of my toy cygwin-breaker).
- For windows DLL_LINK, the initializers may or may not have already 
run, but they're supposed to load/unload correctly even if none of their 
dependencies are available, so I wouldn't expect any trouble from them. 
Incidentally, AFAICT we don't care whether the parent and child bases 
match. I can't find any code that checks whether such dlls loaded at the 
same address, and they don't end up in the dll_list.
- For cygwin DLL_LINK, the initializers did not run (because of 
in_forkee=1), but the initialized data has been copied over from the 
parent. Currently we run the finalizers regardless of whether the 
child's base addresses match those of the parent.  Mayhem results, even 
for matched dlls (see my other email -- perhaps libgcc_s tries to follow 
invalid pointers into mismatched dlls?).

In other words, if we realize part way through forking that it's not 
going to work, we're in trouble (and "partway" here means everything 
between the parent's call to CreateProcess call and assigning 
in_forkee=false). Some dlls are inconsistent and none of their 
finalizers are safe to run; others appear consistent but have been 
poisoned by state from inconsistent dlls (libgcc_s!), so some unknown 
subset of finalizers can be run. Still other dlls really are consistent, 
and it's arguably bad if we don't run their finalizers, but I don't know 
how to identify them.

For the moment I've just disabled all finalizers if in_forkee=1, on the 
premise that it's better to risk not runing a valid finalizer than to 
risk running an invalid one. That made the access violations go away, 
though I still occasionally see an error closing the pinfo handle which 
causes the child to abort and the parent to seg fault (why???).

Thoughts?
Ryan