[ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.1

Ken Brown kbrown@cornell.edu
Fri Jul 3 13:09:00 GMT 2015


On 7/3/2015 6:47 AM, Corinna Vinschen wrote:
> On Jul  2 15:25, Ken Brown wrote:
>> On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
>>> On Jul  2 14:13, Corinna Vinschen wrote:
>>>> On Jul  1 22:10, Ken Brown wrote:
>>>>> I may have spoken too soon.  As I repeat the experiment on a different
>>>>> computer, with a build from a slightly different snapshot of the emacs
>>>>> trunk, emacs crashes when I type 'C-x d' with the following stack dump:
>>>>>
>>>>> Stack trace:
>>>>> Frame        Function    Args
>>>>> 00100A3E240  00180071CC3 (00000829630, 000008296D0, 00000000000, 0000082CE00)
>>>>> 00030000002  001800732BE (00000000000, 00000000002, 00100A48C80, 00000000002)
>>>>> 00000000000  00000006B40 (00000000002, 00100A48C80, 00000000002, 00100A48768)
>>>>> 00000000000  21000000003 (00000000002, 00100A48C80, 00000000002, 00100A48768)
>>>>> End of stack trace
>>>>>
>>>>> $ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg
>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exception.h:175
>>>>>
>>>>> $ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg
>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exceptions.cc:1639
>>>>
>>>> That points to a crash while setting up the alternate stack.  This is
>>>> always a possibility because, in contrast to the kernel signal handler
>>>> in a real POSIX system, the Cygwin exception handler is still running on
>>>> the stack which triggered the crash up to the point where we call the
>>>> signal handler function.  Dependent on how the stack overflow occured,
>>>> this additional stack usage may be enough to kill the process for good.
>>>>
>>>> Out of curiosity, can you add this to the init_sigsegv() function:
>>>>
>>>>    #include <windows.h>
>>>>    [...]
>>>>    init_sigsegv (void)
>>>>    {
>>>>      [...]
>>>>      SetThreadStackGuarantee (65536);
>>>
>>> Of course this only works "per thread", so if init_sigsegv is called
>>> for the main thread, only the main thread gets this treatment.  For
>>> testing this should be enough, though.
>>
>> That didn't make any difference.
>
> It should have.  If you don't also tweak STACK_DANGER_ZONE accordingly,
> handle_sigsegv should fail to call siglongjmp.  Either way, I tested
> it locally as well, and it doesn't work.
>
> In the meantime I found that there's another problem.  Assuming you
> longjmp out of handle_sigsegv, the stack will still be "broken".
> It doesn't have the usual guard pages anymore, and the next time
> you have a stack overflow, NTDLL will simply terminate the process.
>
> I create a wrapper function which resets the stack so it has valid guard
> pages again and then the stack overflow can be handled repeatedly.
>
> While I was at it, I found that the setup for pthread stacks is not
> quite right, either, so right now I'm hacking on this stuff to make
> it behave as expected in the usual cases.
>
>> But I do have a little more information.
>> I tried running emacs under gdb with a breakpoint at handle_sigsegv.  The
>> breakpoint is hit when I deliberately trigger the stack overflow.  Then I
>> continue, emacs says it has recovered from the stack overflow, and I type
>> 'C-x d'.  At this point there's a second SIGSEGV and handle_sigsegv is
>> called again.  But this time garbage collection is in progress, and
>> handle_sigsegv just gives up.
>
> Sounds right to me.
>
>> I don't know what caused the second SIGSEGV but I'll try to figure that out
>> when I next have a chance to look at this.  I also don't know why the stack
>> dump pointed to a crash while setting up the alternate stack, since the
>> fatal crash actually seems to have happened later.  But maybe the stack was
>> just completely messed up after the second SIGSEGV and the stack dump can't
>> be trusted.

I think I found the cause of that second SIGSEGV, and, if I'm right, it has 
nothing to do with Cygwin.  I think the problem was that in my testing, I forgot 
to reset max-specpdl-size and max-lisp-eval-depth to reasonable values after the 
recovery from stack overflow.  If I do that, then I can no longer reproduce the 
crash.

For the record, here's my complete elisp test case:

(setq max-specpdl-size 83200000
       max-lisp-eval-depth 640000)
(defun foo () (foo))
(foo)
;; The stack has now overflowed, and emacs has recovered.
(setq max-specpdl-size 1300
       max-lisp-eval-depth 800)
;; Can now continue working.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list