This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]


On 7/3/2015 6:47 AM, Corinna Vinschen wrote:
On Jul  2 15:25, Ken Brown wrote:
On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
On Jul  2 14:13, Corinna Vinschen wrote:
On Jul  1 22:10, Ken Brown wrote:
I may have spoken too soon.  As I repeat the experiment on a different
computer, with a build from a slightly different snapshot of the emacs
trunk, emacs crashes when I type 'C-x d' with the following stack dump:

Stack trace:
Frame        Function    Args
00100A3E240  00180071CC3 (00000829630, 000008296D0, 00000000000, 0000082CE00)
00030000002  001800732BE (00000000000, 00000000002, 00100A48C80, 00000000002)
00000000000  00000006B40 (00000000002, 00100A48C80, 00000000002, 00100A48768)
00000000000  21000000003 (00000000002, 00100A48C80, 00000000002, 00100A48768)
End of stack trace

$ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg

$ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg

That points to a crash while setting up the alternate stack.  This is
always a possibility because, in contrast to the kernel signal handler
in a real POSIX system, the Cygwin exception handler is still running on
the stack which triggered the crash up to the point where we call the
signal handler function.  Dependent on how the stack overflow occured,
this additional stack usage may be enough to kill the process for good.

Out of curiosity, can you add this to the init_sigsegv() function:

   #include <windows.h>
   init_sigsegv (void)
     SetThreadStackGuarantee (65536);

Of course this only works "per thread", so if init_sigsegv is called
for the main thread, only the main thread gets this treatment.  For
testing this should be enough, though.

That didn't make any difference.

It should have.  If you don't also tweak STACK_DANGER_ZONE accordingly,
handle_sigsegv should fail to call siglongjmp.  Either way, I tested
it locally as well, and it doesn't work.

In the meantime I found that there's another problem.  Assuming you
longjmp out of handle_sigsegv, the stack will still be "broken".
It doesn't have the usual guard pages anymore, and the next time
you have a stack overflow, NTDLL will simply terminate the process.

I create a wrapper function which resets the stack so it has valid guard
pages again and then the stack overflow can be handled repeatedly.

While I was at it, I found that the setup for pthread stacks is not
quite right, either, so right now I'm hacking on this stuff to make
it behave as expected in the usual cases.

But I do have a little more information.
I tried running emacs under gdb with a breakpoint at handle_sigsegv.  The
breakpoint is hit when I deliberately trigger the stack overflow.  Then I
continue, emacs says it has recovered from the stack overflow, and I type
'C-x d'.  At this point there's a second SIGSEGV and handle_sigsegv is
called again.  But this time garbage collection is in progress, and
handle_sigsegv just gives up.

Sounds right to me.

I don't know what caused the second SIGSEGV but I'll try to figure that out
when I next have a chance to look at this.  I also don't know why the stack
dump pointed to a crash while setting up the alternate stack, since the
fatal crash actually seems to have happened later.  But maybe the stack was
just completely messed up after the second SIGSEGV and the stack dump can't
be trusted.

I think I found the cause of that second SIGSEGV, and, if I'm right, it has nothing to do with Cygwin. I think the problem was that in my testing, I forgot to reset max-specpdl-size and max-lisp-eval-depth to reasonable values after the recovery from stack overflow. If I do that, then I can no longer reproduce the crash.

For the record, here's my complete elisp test case:

(setq max-specpdl-size 83200000
      max-lisp-eval-depth 640000)
(defun foo () (foo))
;; The stack has now overflowed, and emacs has recovered.
(setq max-specpdl-size 1300
      max-lisp-eval-depth 800)
;; Can now continue working.


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]