(call-process ...) hangs in emacs

Ken Brown kbrown@cornell.edu
Fri Aug 8 13:27:00 GMT 2014


On 8/7/2014 5:42 PM, Eric Blake wrote:
> On 08/07/2014 12:53 PM, Ken Brown wrote:
>> On 8/7/2014 11:30 AM, Eric Blake wrote:
>>> On 08/07/2014 05:51 AM, Ken Brown wrote:
>>>>
>>>> I think I found the problem with NORMAL mutexes.  emacs calls
>>>> pthread_atfork after initializing the mutexes, and the resulting
>>>> 'prepare' handler locks the mutexes.  (The parent and child handlers
>>>> unlock them.)  So when emacs calls fork, the mutexes are locked, and
>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock.
>>>> Here's a gdb backtrace showing the sequence of calls:
>>>
>>> Arguably, that's an upstream bug in emacs.  POSIX has declared
>>> pthread_atfork to be fundamentally useless; it is broken by design,
>>> because you cannot use it for anything that is not async-signal-safe
>>> without risking deadlock.  And (except for sem_post()), NONE of the
>>> standardized locking functions are async-signal-safe.
>>>
>>> http://austingroupbugs.net/view.php?id=858
>>>
>>> That said, it would still be nice to support this, since even though the
>>> theory says it is broken, there are still lots of (broken)
>>> programs/libraries still trying to use it.
>>
>> So what do you think emacs should do instead of using pthread_atfork? Or
>> is it better to just remove it?  I don't know how likely it is that this
>> would cause a problem.
>
> The POSIX recommendation is that multithreaded apps limit themselves
> solely to async-signal-safe functions in the window between fork and
> exec (or to use pthread_spawn instead of fork/exec).  I don't know what
> emacs is trying to do in that window, but at this point, it's certainly
> worth reporting it upstream.  If you need a pointer to the full list of
> async-signal-safe functions:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
> and search for "The following table defines a set of functions that
> shall be async-signal-safe."
>
> The most common deadlocks when violating async-signal-safety rules look
> like this in single-threaded programs:
>
> function calls malloc()
>    malloc() grabs a non-recursive mutex
>      async signal arrives
>        signal handler called
>          signal handler calls malloc()
>            malloc() can't grab the mutex - deadlock
>
> and this counterpart in multithreaded programs:
>
> thread1 calls malloc()
>    malloc() grabs a non-recursive mutex
> thread 2 gains control and calls fork()
>    because of the fork, thread1 no longer exists to release the lock
>    child process calls malloc()
>      malloc() tries to grab mutex, but it is locked with no thread to
> release it
>
> Switching malloc() to a recursive lock may or may not "solve" the
> single-threaded deadlock (in that malloc can now obtain the mutex), but
> it is probably NOT what you want to happen (unless malloc is fully
> re-entrant, the inner instance will see incomplete data and either be
> totally clobbered itself, or else totally clobber the outer instance
> when it returns).  So it's GOOD that malloc does NOT use a recursive
> mutex by default.
>
> In the multithreaded case, you are flat out hosed. Switching to a
> recursive lock does not change the picture - you are still deadlocked
> waiting on thread1 to release the lock, but thread1 doesn't exist.

Thanks for the explanations, Eric.  I've filed an emacs bug report:

   http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list