This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4


On 7/6/2015 9:15 AM, Ken Brown wrote:
Hi Corinna,

On 7/6/2015 6:01 AM, Corinna Vinschen wrote:
Hi Ken,


thanks for further testing this.


On Jul  5 22:15, Ken Brown wrote:
On 7/5/2015 5:34 PM, Corinna Vinschen wrote:
This test release needs some good testing!

I repeated the emacs experiment discussed in the "[ANNOUNCEMENT] TEST
RELEASE: Cygwin 2.1.0-0.1" thread.  In the 32-bit case, the results were
more-or-less the same as before: I forced a stack overflow, emacs recovered,
I tried to continue working, there was a second SIGSEGV, and handle_sigsegv
bailed out because garbage collection was in progress.  This time I was
unable to prevent the second SIGSEGV by resetting max-specpdl-size and
max-lisp-eval-depth.  I'm not sure what caused the second SIGSEGV, but it
might have nothing to do with Cygwin.

In the 64-bit case, however, the recovery from stack overflow never happened
(i.e., the program never reached the siglongjmp).  Here's a gdb session:
[...]
1647          if (!getrlimit (RLIMIT_STACK, &rlim))
(gdb)
1656              beg = stack_bottom;
(gdb)
1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
(gdb)
1658              if (beg > end)
(gdb)
1660              addr = (char *) siginfo->si_addr;
(gdb)
1663              if (beg < addr && addr < end
(gdb) p beg
$1 = 0x82ca27 ""
(gdb) p addr
$2 = 0x33ff8 ""

I can't reproduce this.  It works fine for me.  For reference I attached
my simplified testcase again.   It's basically the emacs SIGSEGV setup,
main triggers the stack overflow, the handler tries to write a file for
testing if that works from the handler, then it siglongjmps.  The main
function tests if it can still fork, and then it repeats the action to
test if we're back to normal in terms of signal handling.

If it works (and it does for me) the output looks like this:

   $ ./sigalt
   command loop 1 before crash
   command loop 1 after crash
   In child
   In parent
   command loop 2 before crash
   command loop 2 after crash
   In child
   In parent

On W8.1 for a standard GCC build of this testcase I get:

   (gdb) p beg
   $1 = 0x40ac3 <error: Cannot access memory at address 0x40ac3>
   (gdb) p addr
   $2 = 0x43848 <error: Cannot access memory at address 0x43848>
   (gdb) p end
   $3 = 0x23cac3 ""
   (gdb) p/x rlim.rlim_cur
   $5 = 0x1fc000

Check default stacksize:

   )$ peflags -x ./sigalt
   ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

   0x200000 - dead zone 4K - default W8.1 64 bit guardpagesize 3 * 4K ==
   0x1fc000, the value rlim.rlim_cur returns.  Looks good to me.

On W8.1 32 bit under WOW:

   (gdb) p beg
   $1 = 0x8fc33 ""
   (gdb) p addr
   $2 = 0x92d5c <error: Cannot access memory at address 0x92d5c>
   (gdb) p end
   $3 = 0x28cc33 ""
   (gdb) p/x rlim.rlim_cur
   $4 = 0x1fd000

   $ peflags -x ./sigalt
   ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

   0x200000 - dead zone 4K - default W8.1 32 bit guardpagesize 2 * 4K ==
   0x1fd000.

On W7 32 bit native:

(gdb) p beg
$1 = 0x2ec43 "\376\356..."
(gdb) p addr
$2 = 0x32d6c ""
(gdb) p end
$3 = 0x22cc43 ""
(gdb) p rlim.rlim_cur
$4 = 2088960
(gdb) p/x rlim.rlim_cur
$5 = 0x1fe000

   $ peflags -x ./sigalt
   ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

   0x200000 - dead zone 4K - default W7 32 bit guardpagesize 1 * 4K ==
   0x1fe000.

Note that addr < beg, so we never reach the siglongjmp.

I have no explanation for this.  What OS?  What does rlim_cur contain?
What does peflags -x print for this executable?

I'm on W7 64-bit.  The problem seems to be that rlim_cur is too big.

$ peflags -x ./emacs
./emacs: stack reserve size      : 8388608 (0x800000) bytes

(gdb) p beg
$3 = 0x82ca27 ""
(gdb) p/x rlim.rlim_cur
$2 = 0x850e80

So there's overflow when end is computed:

(gdb) p end
$4 = 0xfffffffffffdbba7 <error: Cannot access memory at address 0xfffffffffffdbba7>

This doesn't happen when I run your testcase with the same 8MB stack size:

$ peflags -x0x800000 ./sigalt.exe
./sigalt.exe: stack reserve size      : 8388608 (0x800000) bytes

(gdb) p beg
$1 = 0x82cabb ""
(gdb) p/x rlim.rlim_cur
$2 = 0x7fd000
(gdb) p end
$3 = 0x2fabb

And last but not least, what is emacs doing there?  The stack should be
pretty much in a good shape when it's back to the main loop.  The stack
is fully commited and has the default number of guardpages at the bottom,
as it is just short of the stack overflow.

For debugging purposes I also added a global variable called "tib" and a
memory info struct called "m" to the testcase which are initialized
right at the start of main.  tib points to the start of the TEB (Thread
Environment Block, a Windows per-thread bookkeeping structure) of the
main thread.  If you expand it right after it's fetched, you get
something along these lines:

   (gdb) p *tib
   $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x20c000,
     SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
     ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}

Note the values of StackBase and StackLimit and compare with your beg and
end values.  StackBase is the upper limit of the stack.  It grows downward
from there.  StackLimit is the lowest address as yet commited.  It's not much
yet as you can see, 0x230000-0x20c000 == 0x24000 == 144K.  Since Cygwin
executables have a default stack of 2 Megs, the allocation base of the stack
is probably at 0x30000.  This can be checked by looking at m:

   (gdb) p m
   $1 = {BaseAddress = 0x22c000, AllocationBase = 0x30000, AllocationProtect = 4,
     RegionSize = 16384, State = 4096, Protect = 4, Type = 131072}

See the value of AllocationBase.

When you hit the breakpoint in handle_sigsegv, the output of tib should
look like this:

   (gdb) p *tib
   $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x32000,
     SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
     ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}

Observe the value of StackLimit.  For this output I ran the testcase on
W7 32 bit.  It has a default guardpage of 4K.  The new wrapper I wrote
in Cygwin restored the stack to its state rifght before the stack overflow
occured:

   - At 0x30000 we have the 4K dead zone, which is always only reserved,
     never commited.

   - At 0x31000 the 4K guard page starts.

   - Thus the StackLimit (the start of the commited region of the stack)
     starts at 0x32000.

You can utilize tib and m for testing in emacs as well.  Just do this:

   #include <windows.h>

   NT_TIB *tib;
   MEMORY_BASIC_INFORMATION m;

   [...]

   in main:

   /* Record (approximately) where the stack begins.  */
   stack_bottom = &stack_bottom_variable;
   tib = (NT_TIB *) __readfsdword(PcTeb);
   VirtualQuery (stack_bottom, &m, sizeof m);

I'll try this next and report back.

PcTeb seems to be defined only on x86.  What should I do on x86_64?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]