(call-process ...) hangs in emacs

Ken Brown kbrown@cornell.edu
Tue Aug 5 12:21:00 GMT 2014


On 8/4/2014 9:45 AM, Corinna Vinschen wrote:
> On Aug  4 09:34, Ken Brown wrote:
>> On 8/4/2014 4:00 AM, Corinna Vinschen wrote:
>>> On Aug  3 21:02, Ken Brown wrote:
>>>> On 8/1/2014 9:32 AM, Corinna Vinschen wrote:
>>>>> It could be a problem with the new default pthread mutexes being
>>>>> NORMAL, rather then ERRORCHECK mutexes.
>>>>
>>>> That does seem to be the problem, since I can reproduce the bug starting
>>>> with the 2014-07-14 snapshot.  More precisely, I can reproduce it using
>>>> emacs-nox (which is what the OP was using according to his cygcheck output)
>>>> but not using emacs-X11 or emacs-w32.
>>>>
>>>> I tried running emacs under gdb with a breakpoint at call_process, but all I
>>>> could see from that is that emacs tries to fork a subprocess, but the call
>>>> to fork() never returns.  I also tried running it under strace, but again
>>>> all I can see is that fork() is called and then everything seems to be at a
>>>> standstill.
>>>>
>>>> Corinna, if you want to take a look, here's the precise recipe:
>>>>
>>>> 1. emacs-nox -Q [This should start emacs and put you in the *scratch*
>>>> buffer.]
>>>>
>>>> 2. Enter the following text into the buffer:
>>>>
>>>>    (call-process "pwd" nil t)
>>>>
>>>> 3. Position the cursor at the end of the line and type Ctrl-j.
>>>>
>>>> What should happen, and what does happen prior to the 2014-07-14 snapshot,
>>>> is that the current directory is displayed, followed by the exit code of 0.
>>>> What happens instead is that emacs appears to hang.
>>>
>>> How does emacs start a process?  Does it create a thread and then
>>> forks and execs from the thread?  Does it use its own pthread_mutex
>>> to control the job?  Is there a chance to create an STC of this
>>> process?
>>
>> emacs does some bookkeeping and then calls vfork.  It does not create a new
>> thread, nor does it create a pthread_mutex.  The only pthread_mutexes
>> created anywhere in the emacs source code are in its implementation of
>> malloc and friends, not in anything directly related to controlling
>> subprocesses.  (FWIW, this malloc implementation is used in the Cygwin build
>> of emacs but not in the Linux build.)
>
> Can you take a close look here?  This malloc will be used by Cygwin
> as well if it's implemented in the usual way and...
>
>> I did think about trying to create an STC, but I'm stymied because the
>> problem depends so strongly on how emacs is run:
>>
>>   - If emacs is run interactively, the problem only occurs with emacs-nox,
>> not with emacs-X11 or emacs-w32.
>>
>>   - If emacs is run non-interactively (i.e., in batch mode), the problem
>> occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out
>> earlier in the thread.
>>
>> I can't think of any way to capture these peculiarities in an STC.
>
> ...this, and the fact that fork/exec (vfork == fork on Cygwin) still
> works nicely in other scenarios points to some problem with the usage of
> pthread_mutexes in the application may be the culprit.
>
> For instance, is it possible that emacs expects the pthread_mutexes
> in malloc to be ERRORCHECK mutexes?  What if you explicitely set
> them to ERRORCHECK at creation time?

That doesn't seem to be the issue, but I think I did find the problem, 
and it looks like there might be both an emacs bug and a Cygwin bug. 
Here's the relevant code from emacs's gmalloc.c:

pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER;

[...]

   /* Some pthread implementations call malloc for statically
      initialized mutexes when they are used first.  To avoid such a
      situation, we initialize mutexes here while their use is
      disabled in malloc etc.  */
   pthread_mutex_init (&_malloc_mutex, NULL);
   pthread_mutex_init (&_aligned_blocks_mutex, NULL);


The pthread_mutexes are initialized twice, resulting in undefined 
behavior according to Posix.  That's the emacs bug.  But simply removing 
the static initialization doesn't fix the problem.  On the other hand, 
the following patch does seem to fix it, at least in preliminary testing:

=== modified file 'src/gmalloc.c'
--- src/gmalloc.c       2014-03-04 19:02:49 +0000
+++ src/gmalloc.c       2014-08-05 01:35:38 +0000
@@ -490,8 +490,8 @@
  }

  #ifdef USE_PTHREAD
-pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER;
-pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t _malloc_mutex;
+pthread_mutex_t _aligned_blocks_mutex;
  int _malloc_thread_enabled_p;

  static void
@@ -526,8 +526,11 @@
       initialized mutexes when they are used first.  To avoid such a
       situation, we initialize mutexes here while their use is
       disabled in malloc etc.  */
-  pthread_mutex_init (&_malloc_mutex, NULL);
-  pthread_mutex_init (&_aligned_blocks_mutex, NULL);
+  pthread_mutexattr_t attr1, attr2;
+  pthread_mutexattr_settype (&attr1, PTHREAD_MUTEX_NORMAL);
+  pthread_mutexattr_settype (&attr2, PTHREAD_MUTEX_NORMAL);
+  pthread_mutex_init (&_malloc_mutex, &attr1);
+  pthread_mutex_init (&_aligned_blocks_mutex, &attr2);
    pthread_atfork (malloc_atfork_handler_prepare,
                   malloc_atfork_handler_parent,
                   malloc_atfork_handler_child);


The first hunk avoids the double initialization, but I don't understand 
why the second hunk does anything.  Since PTHREAD_MUTEX_NORMAL is now 
the default, shouldn't calling pthread_mutex_init with NULL second 
argument be equivalent to my calls to pthread_mutexattr_settype?  Does 
this indicate a Cygwin bug, or am I misunderstanding something?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list