(call-process ...) hangs in emacs
Ken Brown
kbrown@cornell.edu
Mon Aug 18 12:28:00 GMT 2014
On 8/8/2014 9:26 AM, Ken Brown wrote:
> On 8/7/2014 5:42 PM, Eric Blake wrote:
>> On 08/07/2014 12:53 PM, Ken Brown wrote:
>>> On 8/7/2014 11:30 AM, Eric Blake wrote:
>>>> On 08/07/2014 05:51 AM, Ken Brown wrote:
>>>>>
>>>>> I think I found the problem with NORMAL mutexes. emacs calls
>>>>> pthread_atfork after initializing the mutexes, and the resulting
>>>>> 'prepare' handler locks the mutexes. (The parent and child handlers
>>>>> unlock them.) So when emacs calls fork, the mutexes are locked, and
>>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock.
>>>>> Here's a gdb backtrace showing the sequence of calls:
>>>>
>>>> Arguably, that's an upstream bug in emacs. POSIX has declared
>>>> pthread_atfork to be fundamentally useless; it is broken by design,
>>>> because you cannot use it for anything that is not async-signal-safe
>>>> without risking deadlock. And (except for sem_post()), NONE of the
>>>> standardized locking functions are async-signal-safe.
>>>>
>>>> http://austingroupbugs.net/view.php?id=858
>>>>
>>>> That said, it would still be nice to support this, since even though
>>>> the
>>>> theory says it is broken, there are still lots of (broken)
>>>> programs/libraries still trying to use it.
>>>
>>> So what do you think emacs should do instead of using pthread_atfork? Or
>>> is it better to just remove it? I don't know how likely it is that this
>>> would cause a problem.
>>
>> The POSIX recommendation is that multithreaded apps limit themselves
>> solely to async-signal-safe functions in the window between fork and
>> exec (or to use pthread_spawn instead of fork/exec). I don't know what
>> emacs is trying to do in that window, but at this point, it's certainly
>> worth reporting it upstream. If you need a pointer to the full list of
>> async-signal-safe functions:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>
>> and search for "The following table defines a set of functions that
>> shall be async-signal-safe."
>>
>> The most common deadlocks when violating async-signal-safety rules look
>> like this in single-threaded programs:
>>
>> function calls malloc()
>> malloc() grabs a non-recursive mutex
>> async signal arrives
>> signal handler called
>> signal handler calls malloc()
>> malloc() can't grab the mutex - deadlock
>>
>> and this counterpart in multithreaded programs:
>>
>> thread1 calls malloc()
>> malloc() grabs a non-recursive mutex
>> thread 2 gains control and calls fork()
>> because of the fork, thread1 no longer exists to release the lock
>> child process calls malloc()
>> malloc() tries to grab mutex, but it is locked with no thread to
>> release it
>>
>> Switching malloc() to a recursive lock may or may not "solve" the
>> single-threaded deadlock (in that malloc can now obtain the mutex), but
>> it is probably NOT what you want to happen (unless malloc is fully
>> re-entrant, the inner instance will see incomplete data and either be
>> totally clobbered itself, or else totally clobber the outer instance
>> when it returns). So it's GOOD that malloc does NOT use a recursive
>> mutex by default.
>>
>> In the multithreaded case, you are flat out hosed. Switching to a
>> recursive lock does not change the picture - you are still deadlocked
>> waiting on thread1 to release the lock, but thread1 doesn't exist.
>
> Thanks for the explanations, Eric. I've filed an emacs bug report:
>
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222
I've just made a new emacs test release that includes a workaround for
this bug. I think I see a way to make emacs use Cygwin's malloc; if
this works, it will provide a better fix for the bug.
Ken
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list