(call-process ...) hangs in emacs

Ken Brown kbrown@cornell.edu
Thu Aug 7 18:54:00 GMT 2014

On 8/7/2014 8:51 AM, Corinna Vinschen wrote:
> Hi Ken,
> On Aug  7 07:51, Ken Brown wrote:
>> Hi Corinna,
>> On 8/5/2014 2:40 PM, Corinna Vinschen wrote:
>>> I'm glad to read that, but I'm still a little bit concerned.  If your
>>> code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you
>>> *might* miss an error case.
>>> I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the
>>> threads calling it.  It looks like the same thread calls malloc from
>>> malloc for some reason and it might be interesting to learn how that
>>> happens and if it's really ok in this scenario, because it seems to
>>> be unexpected by the code.
>> I think I found the problem with NORMAL mutexes.  emacs calls pthread_atfork
>> after initializing the mutexes, and the resulting 'prepare' handler locks
>> the mutexes.  (The parent and child handlers unlock them.)  So when emacs
>> calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL
>> calls calloc, leading to a deadlock. Here's a gdb backtrace showing the
>> sequence of calls:
> First question:  Why does emacs use its own malloc on Cygwin rather
> than the system-provided one?  Is that really necessary?

Cygwin's malloc lacks a few features that emacs requires because of the 
unusual way emacs is built.  The most important such features (or maybe 
even the only ones) are malloc_set_state and malloc_get_state.
>> #0  malloc (size=size@entry=40) at gmalloc.c:919
>> #1  0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510
>> #2  0x61082074 in calloc (nmemb=1, size=40)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100
>> #3  0x61003177 in operator new (s=s@entry=40)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23
>> #4  0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 <reent_data+852>,
>>      attr=0x0, initializer=0x12)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118
>> #5  0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 <reent_data+852>)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170
>> #6  0x611319d8 in __fp_lock (ptr=0x61187cd0 <reent_data+752>)
> Right, __fp_lock needs a pthread lock and since this lock hasn't been
> used yet, it has to create it.  The pthread_mutex creation calls the
> new operator which in turn calls calloc.
>>      at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287
>> #7  0x61154f75 in _fwalk (ptr=0x28d544,
>>      function=function@entry=0x611319c0 <__fp_lock>)
>>      at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50
>> #8  0x61131dea in __fp_lock_all ()
>>      at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307
>> #9  0x610fa45e in pthread::atforkprepare ()
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031
>> #10 0x61076292 in lock_pthread (this=<synthetic pointer>)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137
>> #11 hold_everything (x=<synthetic pointer>, this=<synthetic pointer>)
>>      at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169
>> #12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582
>> Is there a better way to deal with this issue than using ERRORCHECK mutexes?
> Did you check if you get an error from pthread_mutex_lock on the
> second invocation of malloc?  Is it EDEADLK?  If so, you can
> ignore the error, but if you want to go ahead without adding lots
> of error checking you might be better off using a RECURSIVE mutex.

I didn't check the error, but it seemed clear from the code that that 
was what was happening.  Yes, using a RECURSIVE mutex sounds like a good 
idea.  Or maybe it would be just as good to remove the call to 
pthread_atfork.  See my reply to Eric later in the thread.


