This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: (call-process ...) hangs in emacs


On 8/4/2014 9:45 AM, Corinna Vinschen wrote:
On Aug  4 09:34, Ken Brown wrote:
On 8/4/2014 4:00 AM, Corinna Vinschen wrote:
On Aug  3 21:02, Ken Brown wrote:
On 8/1/2014 9:32 AM, Corinna Vinschen wrote:
It could be a problem with the new default pthread mutexes being
NORMAL, rather then ERRORCHECK mutexes.

That does seem to be the problem, since I can reproduce the bug starting
with the 2014-07-14 snapshot.  More precisely, I can reproduce it using
emacs-nox (which is what the OP was using according to his cygcheck output)
but not using emacs-X11 or emacs-w32.

I tried running emacs under gdb with a breakpoint at call_process, but all I
could see from that is that emacs tries to fork a subprocess, but the call
to fork() never returns.  I also tried running it under strace, but again
all I can see is that fork() is called and then everything seems to be at a
standstill.

Corinna, if you want to take a look, here's the precise recipe:

1. emacs-nox -Q [This should start emacs and put you in the *scratch*
buffer.]

2. Enter the following text into the buffer:

   (call-process "pwd" nil t)

3. Position the cursor at the end of the line and type Ctrl-j.

What should happen, and what does happen prior to the 2014-07-14 snapshot,
is that the current directory is displayed, followed by the exit code of 0.
What happens instead is that emacs appears to hang.

How does emacs start a process?  Does it create a thread and then
forks and execs from the thread?  Does it use its own pthread_mutex
to control the job?  Is there a chance to create an STC of this
process?

emacs does some bookkeeping and then calls vfork.  It does not create a new
thread, nor does it create a pthread_mutex.  The only pthread_mutexes
created anywhere in the emacs source code are in its implementation of
malloc and friends, not in anything directly related to controlling
subprocesses.  (FWIW, this malloc implementation is used in the Cygwin build
of emacs but not in the Linux build.)

Can you take a close look here?  This malloc will be used by Cygwin
as well if it's implemented in the usual way and...

I did think about trying to create an STC, but I'm stymied because the
problem depends so strongly on how emacs is run:

  - If emacs is run interactively, the problem only occurs with emacs-nox,
not with emacs-X11 or emacs-w32.

  - If emacs is run non-interactively (i.e., in batch mode), the problem
occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out
earlier in the thread.

I can't think of any way to capture these peculiarities in an STC.

...this, and the fact that fork/exec (vfork == fork on Cygwin) still
works nicely in other scenarios points to some problem with the usage of
pthread_mutexes in the application may be the culprit.

For instance, is it possible that emacs expects the pthread_mutexes
in malloc to be ERRORCHECK mutexes?  What if you explicitely set
them to ERRORCHECK at creation time?

That doesn't seem to be the issue, but I think I did find the problem, and it looks like there might be both an emacs bug and a Cygwin bug. Here's the relevant code from emacs's gmalloc.c:

pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER;

[...]

  /* Some pthread implementations call malloc for statically
     initialized mutexes when they are used first.  To avoid such a
     situation, we initialize mutexes here while their use is
     disabled in malloc etc.  */
  pthread_mutex_init (&_malloc_mutex, NULL);
  pthread_mutex_init (&_aligned_blocks_mutex, NULL);


The pthread_mutexes are initialized twice, resulting in undefined behavior according to Posix. That's the emacs bug. But simply removing the static initialization doesn't fix the problem. On the other hand, the following patch does seem to fix it, at least in preliminary testing:

=== modified file 'src/gmalloc.c'
--- src/gmalloc.c       2014-03-04 19:02:49 +0000
+++ src/gmalloc.c       2014-08-05 01:35:38 +0000
@@ -490,8 +490,8 @@
 }

 #ifdef USE_PTHREAD
-pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER;
-pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t _malloc_mutex;
+pthread_mutex_t _aligned_blocks_mutex;
 int _malloc_thread_enabled_p;

 static void
@@ -526,8 +526,11 @@
      initialized mutexes when they are used first.  To avoid such a
      situation, we initialize mutexes here while their use is
      disabled in malloc etc.  */
-  pthread_mutex_init (&_malloc_mutex, NULL);
-  pthread_mutex_init (&_aligned_blocks_mutex, NULL);
+  pthread_mutexattr_t attr1, attr2;
+  pthread_mutexattr_settype (&attr1, PTHREAD_MUTEX_NORMAL);
+  pthread_mutexattr_settype (&attr2, PTHREAD_MUTEX_NORMAL);
+  pthread_mutex_init (&_malloc_mutex, &attr1);
+  pthread_mutex_init (&_aligned_blocks_mutex, &attr2);
   pthread_atfork (malloc_atfork_handler_prepare,
                  malloc_atfork_handler_parent,
                  malloc_atfork_handler_child);


The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]