How to make child of failed fork exit cleanly?
Ryan Johnson
ryan.johnson@cs.utoronto.ca
Tue May 3 23:03:00 GMT 2011
On 03/05/2011 2:41 PM, Corinna Vinschen wrote:
> On May 3 11:46, Ryan Johnson wrote:
>> Hi all,
>>
>> I'm working on some changes to fork() which would detect early the
>> case where a parent-child pair have unresolvable differences in
>> address space layout (e.g. thread stacks, heaps, or
>> statically-linked dlls which moved).
>>
>> Detecting the problem turned out to be pretty easy, but making the
>> child exit cleanly is not. This leads to two questions, followed by
>> what I have figured out so far while attempting to answer them
>> myself.
>>
>> 1. What's the best way to make a child process notify the parent
>> that the fork() cannot succeed, and exit cleanly?
> Usually by using some helpful status code which then can be recognized
> by child_info::proc_retry.
Wonderful. I copied the example of heap.cc in calling
fork_info->handle_failure(), and it works beautifully... except when the
child seg faults.
>> Given that the cause of the fork failure is known (rather than some
>> surprise or bug), I propose that the messages go to some strace
>> channel (a new one for fork, perhaps?) and that the child exit
>> without attempting to generate a dump file (especially since dump
> Sounds ok to me, if you're really sure that the situation is not
> recoverable.
I wish it was recoverable, because it's a huge pain for applications
which pull in many statically-linked dlls and then fork (emacs, gcc,
python, ...). Unfortunately, I know of no way to force-unload a dll
brought in by the nt loader. If one of those dlls lands in the wrong
place, we're stuck even if we figure out how to get rid of whatever
heap/stack/file-map was in the way.
>> generation itself has a tendency to cause crashes). It would also be
>> good, in cases where the parent is the reason for fork failures, to
>> prevent Windows from respawning the process so many times (though it
>> is admittedly handy when the child was the problem and the fork
>> succeeds on the nth try).
> See above. That's handled in child_info::proc_retry.
I'll keep that in mind, but for now I'm leaving it alone. Technically
it's always possible the fork could succeed, and I don't know how
effectively I could identify a bad parent in the general case (other
than seeing that fork fails repeatedly).
>> All of this still leaves the question of
>> how to exit the child process, "properly" though. Is it necessary to
>> wait for dll initialization to finish first, for example?
> I'm not sure I understand the question. How do you know which
> DLL is already initialized and which isn't?
I'm talking about a call to dll_list::alloc, due to a DLL_LINK which did
not map to its parent's address. At this point we know the fork has
failed and there's no point continuing to try.
When this happens there are several possibilities: {windows, cygwin dll}
x {DLL_LINK, DLL_LOAD} x {match or not match parent base addr}.
- We know no DLL_LOAD has been mapped, let alone initialized, but AFAICT
that currently doesn't stop cygwin dll finalizers (copied over from the
parent) from running. Maybe I missed something here because I'd expect
this to cause far more trouble than it seems to in practice (I'm still
testing the statically-linked version of my toy cygwin-breaker).
- For windows DLL_LINK, the initializers may or may not have already
run, but they're supposed to load/unload correctly even if none of their
dependencies are available, so I wouldn't expect any trouble from them.
Incidentally, AFAICT we don't care whether the parent and child bases
match. I can't find any code that checks whether such dlls loaded at the
same address, and they don't end up in the dll_list.
- For cygwin DLL_LINK, the initializers did not run (because of
in_forkee=1), but the initialized data has been copied over from the
parent. Currently we run the finalizers regardless of whether the
child's base addresses match those of the parent. Mayhem results, even
for matched dlls (see my other email -- perhaps libgcc_s tries to follow
invalid pointers into mismatched dlls?).
In other words, if we realize part way through forking that it's not
going to work, we're in trouble (and "partway" here means everything
between the parent's call to CreateProcess call and assigning
in_forkee=false). Some dlls are inconsistent and none of their
finalizers are safe to run; others appear consistent but have been
poisoned by state from inconsistent dlls (libgcc_s!), so some unknown
subset of finalizers can be run. Still other dlls really are consistent,
and it's arguably bad if we don't run their finalizers, but I don't know
how to identify them.
For the moment I've just disabled all finalizers if in_forkee=1, on the
premise that it's better to risk not runing a valid finalizer than to
risk running an invalid one. That made the access violations go away,
though I still occasionally see an error closing the pinfo handle which
causes the child to abort and the parent to seg fault (why???).
Thoughts?
Ryan
More information about the Cygwin-developers
mailing list