Improvements to fork handling (1/5)

Christopher Faylor
Sun May 22 20:29:00 GMT 2011

On Sun, May 22, 2011 at 09:04:40AM -0400, Ryan Johnson wrote:
>On 21/05/2011 9:41 PM, Christopher Faylor wrote:
>> On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>>> Hi all,
>>> This is the first of a series of patches, sent in separate emails as
>>> requested.
>>> The first patch allows a child which failed due to address space
>>> clobbers to report cleanly back to the parent. As a result, DLL_LINK
>>> which land wrong, DLL_LOAD whose space gets clobbered, and failure to
>>> replicate the cygheap, generate retries and dispense with the terminal
>>> spam. Handling of unexpected errors should not have changed. Further,
>>> the patch fixes several sources of access violations and crashes,
>>> including:
>>> - accessing invalid state after failing to notice that a
>>> statically-linked dll loaded at the wrong location
>>> - accessing invalid state while running dtors on a failed forkee. I
>>> follow cgf's approach of simply not running any dtors, based on the
>>> observation that dlls in the parent (gcc_s!) can store state about other
>>> dlls and crash trying to access that state in the child, even if they
>>> appeared to map properly in both processes.
>>> - attempting to generate a stack trace when somebody in the call chain
>>> used alloca(). This one is only sidestepped here, because we eliminate
>>> the access violations and api_fatal calls which would have triggered the
>>> problematic stack traces. I have a separate patch which allows offending
>>> functions to disable stack traces, if folks are interested, but it was
>>> kind of noisy so I left it out for now (cygwin uses alloca pretty
>>> liberally!).
>>> Ryan
>>> diff --git a/child_info.h b/child_info.h
>>> --- a/child_info.h
>>> +++ b/child_info.h
>>> @@ -92,6 +92,18 @@ public:
>>>    void alloc_stack_hard_way (volatile char *);
>>> };
>>> +/* Several well-known problems can prevent us from patching up a
>>> +   forkee; when such errors arise the child should exit cleanly (with
>>> +   a failure code for the parent) rather than dumping stack.  */
>>> +#define fork_api_fatal(fmt, args...)					\
>>> +  do									\
>>> +    {									\
>>> +      sigproc_printf (fmt,## args);					\
>>> +      fork_info->handle_failure (-1);					\
>>> +    }									\
>>> +  while(0)
>>> +
>>> +
>>> class fhandler_base;
>>> class cygheap_exec_info
>>> diff --git a/ b/
>>> --- a/
>>> +++ b/
>>> @@ -19,6 +19,7 @@ details. */
>>> #include "dtable.h"
>>> #include "cygheap.h"
>>> #include "pinfo.h"
>>> +#include "child_info.h"
>>> #include "cygtls.h"
>>> #include "exception.h"
>>> #include<wchar.h>
>>> @@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>>>      {
>>>        if (!in_forkee)
>>> 	d->count++;	/* Yes.  Bump the usage count. */
>>> +      else if (d->handle != h)
>>> +	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>>> +			d->name, d->handle, h);
>> You seem to be guranteeing a failure in a condition which could conceivably work
>> ok for simple applications, i.e., if a dll loads in a different location that
>> is not necessarily going to cause a problem.
>By fork semantics the condition *is* a failure. If we try to relax the 
>requirement we risk Bad Things happening, usually in hard-to-diagnose 
>ways. The example I have right off is libgcc_s storing pointers to other 
>dlls and seg faulting when it tries to access pointers which were valid 
>in the parent but not in the child. I prefer a fail-fast solution over 

When you add a failure case like this you are assuming that you
understand all of the parameters and that it will make things better.  I
am not convinced that this won't cause previously working cases to fail.

It is not inconceivable that a DLL could be relocated into another
location and continue to work in a forked process.  Yes, I know this
doesn't match the way fork is supposed to work but I'm not as concerned
about that as I am about Cygwin mailing list complaints about new

The reason I'm objecting to this is because I've considered, from time
to time, adding a similar test but have always avoided it because I
couldn't convince myself that it would help more than it would hurt.  If
we are going to add tests, I'd prefer that the testing be done in
frok:parent when the child_copy happens for static and dynamic dlls,
maybe by adding a dll function which first checks that the data/bss can
be copied to the same location as the parent.

>As it is, I'm pretty nervous that Bad Things could happen at some point 
>with windows dlls mapping to the wrong location (we detect that only 
>when they clobber something cygwin needs), but we can hope that few apps 
>which fork() are also heavy windows dlls.... and cross our fingers.
>>>        d->p = p;
>>>      }
>>>    else
>>>      {
>>> +      if (in_forkee)
>>> +	system_printf ("Unexpected dll loaded during fork: %W", name);
>>> +
>>>        /* FIXME: Change this to new at some point. */
>>>        d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
>>> @@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
>>>              preferred_block = reserve_at (d->name, (DWORD) h);
>>> 	}
>>> -  in_forkee = false;
>>> }
>>> struct dllcrt0_info
>>> diff --git a/dll_init.h b/dll_init.h
>>> --- a/dll_init.h
>>> +++ b/dll_init.h
>>> @@ -57,7 +57,7 @@ struct dll
>>>    int init ();
>>>    void run_dtors ()
>>>    {
>>> -    if (has_dtors)
>>> +    if (has_dtors&&  !in_forkee)
>>>        {
>>> 	has_dtors = 0;
>>> 	p.run_dtors ();
>> Isn't this already handled in
>Yes. I didn't notice you had checked that into CVS (you hadn't yet the 
>time I did look). However, the above does have the advantage of residing 
>in one location rather than 2+.

My change was dated 2011/05/04.  That predates your patch by a week.

It's not an advantage to dive into the function multiple times when you
can short-circuit repeated attempts to call it once.


More information about the Cygwin-patches mailing list