This is the mail archive of the
mailing list for the Cygwin project.
Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.1
- From: Ken Brown <kbrown at cornell dot edu>
- To: cygwin at cygwin dot com
- Date: Fri, 03 Jul 2015 09:09:42 -0400
- Subject: Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.1
- Authentication-results: sourceware.org; auth=none
- References: <20150627145259 dot GB23036 at calimero dot vinschen dot de> <20150630195547 dot GG2918 at calimero dot vinschen dot de> <5592F86E dot 8070803 at cornell dot edu> <20150701104748 dot GH2918 at calimero dot vinschen dot de> <20150701135749 dot GN2918 at calimero dot vinschen dot de> <559449AF dot 9010804 at cornell dot edu> <55949D9A dot 7060900 at cornell dot edu> <20150702121301 dot GA25423 at calimero dot vinschen dot de> <20150702122047 dot GS2918 at calimero dot vinschen dot de> <55959036 dot 8070300 at cornell dot edu> <20150703104741 dot GZ2918 at calimero dot vinschen dot de>
On 7/3/2015 6:47 AM, Corinna Vinschen wrote:
On Jul 2 15:25, Ken Brown wrote:
On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
On Jul 2 14:13, Corinna Vinschen wrote:
On Jul 1 22:10, Ken Brown wrote:
I may have spoken too soon. As I repeat the experiment on a different
computer, with a build from a slightly different snapshot of the emacs
trunk, emacs crashes when I type 'C-x d' with the following stack dump:
Frame Function Args
00100A3E240 00180071CC3 (00000829630, 000008296D0, 00000000000, 0000082CE00)
00030000002 001800732BE (00000000000, 00000000002, 00100A48C80, 00000000002)
00000000000 00000006B40 (00000000002, 00100A48C80, 00000000002, 00100A48768)
00000000000 21000000003 (00000000002, 00100A48C80, 00000000002, 00100A48768)
End of stack trace
$ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg
$ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg
That points to a crash while setting up the alternate stack. This is
always a possibility because, in contrast to the kernel signal handler
in a real POSIX system, the Cygwin exception handler is still running on
the stack which triggered the crash up to the point where we call the
signal handler function. Dependent on how the stack overflow occured,
this additional stack usage may be enough to kill the process for good.
Out of curiosity, can you add this to the init_sigsegv() function:
Of course this only works "per thread", so if init_sigsegv is called
for the main thread, only the main thread gets this treatment. For
testing this should be enough, though.
That didn't make any difference.
It should have. If you don't also tweak STACK_DANGER_ZONE accordingly,
handle_sigsegv should fail to call siglongjmp. Either way, I tested
it locally as well, and it doesn't work.
In the meantime I found that there's another problem. Assuming you
longjmp out of handle_sigsegv, the stack will still be "broken".
It doesn't have the usual guard pages anymore, and the next time
you have a stack overflow, NTDLL will simply terminate the process.
I create a wrapper function which resets the stack so it has valid guard
pages again and then the stack overflow can be handled repeatedly.
While I was at it, I found that the setup for pthread stacks is not
quite right, either, so right now I'm hacking on this stuff to make
it behave as expected in the usual cases.
But I do have a little more information.
I tried running emacs under gdb with a breakpoint at handle_sigsegv. The
breakpoint is hit when I deliberately trigger the stack overflow. Then I
continue, emacs says it has recovered from the stack overflow, and I type
'C-x d'. At this point there's a second SIGSEGV and handle_sigsegv is
called again. But this time garbage collection is in progress, and
handle_sigsegv just gives up.
Sounds right to me.
I don't know what caused the second SIGSEGV but I'll try to figure that out
when I next have a chance to look at this. I also don't know why the stack
dump pointed to a crash while setting up the alternate stack, since the
fatal crash actually seems to have happened later. But maybe the stack was
just completely messed up after the second SIGSEGV and the stack dump can't
I think I found the cause of that second SIGSEGV, and, if I'm right, it has
nothing to do with Cygwin. I think the problem was that in my testing, I forgot
to reset max-specpdl-size and max-lisp-eval-depth to reasonable values after the
recovery from stack overflow. If I do that, then I can no longer reproduce the
For the record, here's my complete elisp test case:
(setq max-specpdl-size 83200000
(defun foo () (foo))
;; The stack has now overflowed, and emacs has recovered.
(setq max-specpdl-size 1300
;; Can now continue working.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple