(call-process ...) hangs in emacs

Ken Brown kbrown@cornell.edu
Mon Aug 18 12:28:00 GMT 2014

On 8/8/2014 9:26 AM, Ken Brown wrote:
> On 8/7/2014 5:42 PM, Eric Blake wrote:
>> On 08/07/2014 12:53 PM, Ken Brown wrote:
>>> On 8/7/2014 11:30 AM, Eric Blake wrote:
>>>> On 08/07/2014 05:51 AM, Ken Brown wrote:
>>>>> I think I found the problem with NORMAL mutexes.  emacs calls
>>>>> pthread_atfork after initializing the mutexes, and the resulting
>>>>> 'prepare' handler locks the mutexes.  (The parent and child handlers
>>>>> unlock them.)  So when emacs calls fork, the mutexes are locked, and
>>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock.
>>>>> Here's a gdb backtrace showing the sequence of calls:
>>>> Arguably, that's an upstream bug in emacs.  POSIX has declared
>>>> pthread_atfork to be fundamentally useless; it is broken by design,
>>>> because you cannot use it for anything that is not async-signal-safe
>>>> without risking deadlock.  And (except for sem_post()), NONE of the
>>>> standardized locking functions are async-signal-safe.
>>>> http://austingroupbugs.net/view.php?id=858
>>>> That said, it would still be nice to support this, since even though
>>>> the
>>>> theory says it is broken, there are still lots of (broken)
>>>> programs/libraries still trying to use it.
>>> So what do you think emacs should do instead of using pthread_atfork? Or
>>> is it better to just remove it?  I don't know how likely it is that this
>>> would cause a problem.
>> The POSIX recommendation is that multithreaded apps limit themselves
>> solely to async-signal-safe functions in the window between fork and
>> exec (or to use pthread_spawn instead of fork/exec).  I don't know what
>> emacs is trying to do in that window, but at this point, it's certainly
>> worth reporting it upstream.  If you need a pointer to the full list of
>> async-signal-safe functions:
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>> and search for "The following table defines a set of functions that
>> shall be async-signal-safe."
>> The most common deadlocks when violating async-signal-safety rules look
>> like this in single-threaded programs:
>> function calls malloc()
>>    malloc() grabs a non-recursive mutex
>>      async signal arrives
>>        signal handler called
>>          signal handler calls malloc()
>>            malloc() can't grab the mutex - deadlock
>> and this counterpart in multithreaded programs:
>> thread1 calls malloc()
>>    malloc() grabs a non-recursive mutex
>> thread 2 gains control and calls fork()
>>    because of the fork, thread1 no longer exists to release the lock
>>    child process calls malloc()
>>      malloc() tries to grab mutex, but it is locked with no thread to
>> release it
>> Switching malloc() to a recursive lock may or may not "solve" the
>> single-threaded deadlock (in that malloc can now obtain the mutex), but
>> it is probably NOT what you want to happen (unless malloc is fully
>> re-entrant, the inner instance will see incomplete data and either be
>> totally clobbered itself, or else totally clobber the outer instance
>> when it returns).  So it's GOOD that malloc does NOT use a recursive
>> mutex by default.
>> In the multithreaded case, you are flat out hosed. Switching to a
>> recursive lock does not change the picture - you are still deadlocked
>> waiting on thread1 to release the lock, but thread1 doesn't exist.
> Thanks for the explanations, Eric.  I've filed an emacs bug report:
>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222

I've just made a new emacs test release that includes a workaround for 
this bug.  I think I see a way to make emacs use Cygwin's malloc; if 
this works, it will provide a better fix for the bug.


Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

More information about the Cygwin mailing list