On Jul  2 15:25, Ken Brown wrote:
> On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
> >On Jul  2 14:13, Corinna Vinschen wrote:
> >>On Jul  1 22:10, Ken Brown wrote:
> >>>I may have spoken too soon.  As I repeat the experiment on a different
> >>>computer, with a build from a slightly different snapshot of the emacs
> >>>trunk, emacs crashes when I type 'C-x d' with the following stack dump:
> >>>
> >>>Stack trace:
> >>>Frame        Function    Args
> >>>00100A3E240  00180071CC3 (00000829630, 000008296D0, 00000000000, 0000082CE00)
> >>>00030000002  001800732BE (00000000000, 00000000002, 00100A48C80, 00000000002)
> >>>00000000000  00000006B40 (00000000002, 00100A48C80, 00000000002, 00100A48768)
> >>>00000000000  21000000003 (00000000002, 00100A48C80, 00000000002, 00100A48768)
> >>>End of stack trace
> >>>
> >>>$ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg
> >>>/usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exception.h:175
> >>>
> >>>$ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg
> >>>/usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/
> >>
> >>That points to a crash while setting up the alternate stack.  This is
> >>always a possibility because, in contrast to the kernel signal handler
> >>in a real POSIX system, the Cygwin exception handler is still running on
> >>the stack which triggered the crash up to the point where we call the
> >>signal handler function.  Dependent on how the stack overflow occured,
> >>this additional stack usage may be enough to kill the process for good.
> >>
> >>Out of curiosity, can you add this to the init_sigsegv() function:
> >>
> >>   #include <windows.h>
> >>   [...]
> >>   init_sigsegv (void)
> >>   {
> >>     [...]
> >>     SetThreadStackGuarantee (65536);
> >
> >Of course this only works "per thread", so if init_sigsegv is called
> >for the main thread, only the main thread gets this treatment.  For
> >testing this should be enough, though.
> That didn't make any difference.

It should have.  If you don't also tweak STACK_DANGER_ZONE accordingly,
handle_sigsegv should fail to call siglongjmp.  Either way, I tested
it locally as well, and it doesn't work.

In the meantime I found that there's another problem.  Assuming you
longjmp out of handle_sigsegv, the stack will still be "broken".
It doesn't have the usual guard pages anymore, and the next time
you have a stack overflow, NTDLL will simply terminate the process.

I create a wrapper function which resets the stack so it has valid guard
pages again and then the stack overflow can be handled repeatedly.

While I was at it, I found that the setup for pthread stacks is not
quite right, either, so right now I'm hacking on this stuff to make
it behave as expected in the usual cases.

> But I do have a little more information.
> I tried running emacs under gdb with a breakpoint at handle_sigsegv.  The
> breakpoint is hit when I deliberately trigger the stack overflow.  Then I
> continue, emacs says it has recovered from the stack overflow, and I type
> 'C-x d'.  At this point there's a second SIGSEGV and handle_sigsegv is
> called again.  But this time garbage collection is in progress, and
> handle_sigsegv just gives up.

Sounds right to me.

> I don't know what caused the second SIGSEGV but I'll try to figure that out
> when I next have a chance to look at this.  I also don't know why the stack
> dump pointed to a crash while setting up the alternate stack, since the
> fatal crash actually seems to have happened later.  But maybe the stack was
> just completely messed up after the second SIGSEGV and the stack dump can't
> be trusted.
> More later.



Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

