On 7/3/2015 9:09 AM, Ken Brown wrote:
On 7/3/2015 6:47 AM, Corinna Vinschen wrote:
On Jul  2 15:25, Ken Brown wrote:
On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
On Jul  2 14:13, Corinna Vinschen wrote:
On Jul  1 22:10, Ken Brown wrote:
I may have spoken too soon.  As I repeat the experiment on a
computer, with a build from a slightly different snapshot of the
trunk, emacs crashes when I type 'C-x d' with the following stack

Stack trace:
Frame        Function    Args
00100A3E240  00180071CC3 (00000829630, 000008296D0, 00000000000,
00030000002  001800732BE (00000000000, 00000000002, 00100A48C80,
00000000000  00000006B40 (00000000002, 00100A48C80, 00000000002,
00000000000  21000000003 (00000000002, 00100A48C80, 00000000002,
End of stack trace

$ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg

$ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg

That points to a crash while setting up the alternate stack.  This is
always a possibility because, in contrast to the kernel signal handler
in a real POSIX system, the Cygwin exception handler is still
running on
the stack which triggered the crash up to the point where we call the
signal handler function.  Dependent on how the stack overflow occured,
this additional stack usage may be enough to kill the process for

Out of curiosity, can you add this to the init_sigsegv() function:

   #include <windows.h>
   init_sigsegv (void)
     SetThreadStackGuarantee (65536);

Of course this only works "per thread", so if init_sigsegv is called
for the main thread, only the main thread gets this treatment.  For
testing this should be enough, though.

That didn't make any difference.

It should have.  If you don't also tweak STACK_DANGER_ZONE accordingly,
handle_sigsegv should fail to call siglongjmp.  Either way, I tested
it locally as well, and it doesn't work.

In the meantime I found that there's another problem.  Assuming you
longjmp out of handle_sigsegv, the stack will still be "broken".
It doesn't have the usual guard pages anymore, and the next time
you have a stack overflow, NTDLL will simply terminate the process.

I create a wrapper function which resets the stack so it has valid guard
pages again and then the stack overflow can be handled repeatedly.

While I was at it, I found that the setup for pthread stacks is not
quite right, either, so right now I'm hacking on this stuff to make
it behave as expected in the usual cases.

But I do have a little more information.
I tried running emacs under gdb with a breakpoint at handle_sigsegv.
breakpoint is hit when I deliberately trigger the stack overflow.
Then I
continue, emacs says it has recovered from the stack overflow, and I
'C-x d'.  At this point there's a second SIGSEGV and handle_sigsegv is
called again.  But this time garbage collection is in progress, and
handle_sigsegv just gives up.

Sounds right to me.

I don't know what caused the second SIGSEGV but I'll try to figure
that out
when I next have a chance to look at this.  I also don't know why the
dump pointed to a crash while setting up the alternate stack, since the
fatal crash actually seems to have happened later.  But maybe the
stack was
just completely messed up after the second SIGSEGV and the stack dump
be trusted.

I think I found the cause of that second SIGSEGV, and, if I'm right, it
has nothing to do with Cygwin.  I think the problem was that in my
testing, I forgot to reset max-specpdl-size and max-lisp-eval-depth to
reasonable values after the recovery from stack overflow.  If I do that,
then I can no longer reproduce the crash.

Just for the sake of the archives, it turned out that I could reproduce that second crash after all. But it was an emacs bug, which has now been fixed:

So there are no loose ends; everything I know how to test involving the alternate stack works.


