This is the mail archive of the
mailing list for the Cygwin project.
Re: (call-process ...) hangs in emacs
- From: Ken Brown <kbrown at cornell dot edu>
- To: cygwin at cygwin dot com
- Date: Thu, 07 Aug 2014 14:53:50 -0400
- Subject: Re: (call-process ...) hangs in emacs
- Authentication-results: sourceware.org; auth=none
- References: <20140801133225 dot GD25860 at calimero dot vinschen dot de> <53DEDBBA dot 20102 at cornell dot edu> <20140804080034 dot GA2578 at calimero dot vinschen dot de> <53DF8BDC dot 8090104 at cornell dot edu> <20140804134526 dot GK2578 at calimero dot vinschen dot de> <53E0CC2D dot 4080305 at cornell dot edu> <20140805135830 dot GA9994 at calimero dot vinschen dot de> <53E11A93 dot 9070800 at cornell dot edu> <20140805184047 dot GC13601 at calimero dot vinschen dot de> <53E3685B dot 8050508 at cornell dot edu> <20140807125137 dot GV13601 at calimero dot vinschen dot de>
On 8/7/2014 8:51 AM, Corinna Vinschen wrote:
On Aug 7 07:51, Ken Brown wrote:
On 8/5/2014 2:40 PM, Corinna Vinschen wrote:
I'm glad to read that, but I'm still a little bit concerned. If your
code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you
*might* miss an error case.
I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the
threads calling it. It looks like the same thread calls malloc from
malloc for some reason and it might be interesting to learn how that
happens and if it's really ok in this scenario, because it seems to
be unexpected by the code.
I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork
after initializing the mutexes, and the resulting 'prepare' handler locks
the mutexes. (The parent and child handlers unlock them.) So when emacs
calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL
calls calloc, leading to a deadlock. Here's a gdb backtrace showing the
sequence of calls:
First question: Why does emacs use its own malloc on Cygwin rather
than the system-provided one? Is that really necessary?
Cygwin's malloc lacks a few features that emacs requires because of the
unusual way emacs is built. The most important such features (or maybe
even the only ones) are malloc_set_state and malloc_get_state.
#0 malloc (size=size@entry=40) at gmalloc.c:919
#1 0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510
#2 0x61082074 in calloc (nmemb=1, size=40)
#3 0x61003177 in operator new (s=s@entry=40)
#4 0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 <reent_data+852>,
#5 0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 <reent_data+852>)
#6 0x611319d8 in __fp_lock (ptr=0x61187cd0 <reent_data+752>)
Right, __fp_lock needs a pthread lock and since this lock hasn't been
used yet, it has to create it. The pthread_mutex creation calls the
new operator which in turn calls calloc.
#7 0x61154f75 in _fwalk (ptr=0x28d544,
#8 0x61131dea in __fp_lock_all ()
#9 0x610fa45e in pthread::atforkprepare ()
#10 0x61076292 in lock_pthread (this=<synthetic pointer>)
#11 hold_everything (x=<synthetic pointer>, this=<synthetic pointer>)
#12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582
Is there a better way to deal with this issue than using ERRORCHECK mutexes?
Did you check if you get an error from pthread_mutex_lock on the
second invocation of malloc? Is it EDEADLK? If so, you can
ignore the error, but if you want to go ahead without adding lots
of error checking you might be better off using a RECURSIVE mutex.
I didn't check the error, but it seemed clear from the code that that
was what was happening. Yes, using a RECURSIVE mutex sounds like a good
idea. Or maybe it would be just as good to remove the call to
pthread_atfork. See my reply to Eric later in the thread.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple