64-bit emacs crashes a lot

Sat Aug 10 13:59:00 GMT 2013

On 8/9/2013 11:28 PM, Ryan Johnson wrote:
> On 08/08/2013 2:00 PM, Ryan Johnson wrote:
>> On 08/08/2013 1:42 PM, Ken Brown wrote:
>>> On 8/5/2013 11:29 AM, Ryan Johnson wrote:
>>>> On 05/08/2013 11:00 AM, Ken Brown wrote:
>>>>> On 8/3/2013 3:05 PM, Ryan Johnson wrote:
>>>>>> On 02/08/2013 8:07 AM, Ryan Johnson wrote:
>>>>>>> On 02/08/2013 7:04 AM, Ken Brown wrote:
>>>>>>>> On 8/2/2013 4:02 AM, Corinna Vinschen wrote:
>>>>>>>>> On Aug  1 22:46, Ryan Johnson wrote:
>>>>>>>>>> Here's a new one... I started a compilation, but before it
>>>>>>>>>> actually
>>>>>>>>>> invoked the command it started pegging the CPU. After ^G^G^G, it
>>>>>>>>>> crashed with the following:
>>>>>>>>>>> Auto-save? (y or n) y
>>>>>>>>>>>       0 [main] emacs 5076 C:\cygwin64\bin\emacs-nox.exe: ***
>>>>>>>>>>> fatal
>>>>>>>>>>> error - Internal error: TP_NUM_W_BUFS too small 2268032 >= 10.
>>>>>>>>>
>>>>>>>>> That looks like a memory overwrite.  2268032 is 0x229b80, which
>>>>>>>>> looks
>>>>>>>>> suspiciously like a stack address.  And the overwritten value is
>>>>>>>>> on the
>>>>>>>>> stack, too, well within the cygwin TLS area.  If *this* value gets
>>>>>>>>> overwritten, the TLS is probbaly totally hosed at this point.
>>>>>>>>> There's
>>>>>>>>> just no way to infer the culprit from this limited info.
>>>>>>>>
>>>>>>>> Could this be BLODA?  Ryan, I noticed that you wrote in a different
>>>>>>>> thread, "I recently migrated to 64-bit cygwin...and so far have not
>>>>>>>> had to disable Windows Defender; the latter was a recurring
>>>>>>>> source of
>>>>>>>> trouble for my previous 32-bit cygwin install on Win7/64."
>>>>>>> This would be a whole new level of nasty from a BLODA... I thought
>>>>>>> they only interfered with fork()?
>>>>>>>
>>>>>>> However, this *is* Windows Defender we're talking about... service
>>>>>>> disabled and all cygwin processes restarted. I'll let you know in a
>>>>>>> day or so if the crashes go away.
>>>>>> Rats. I just had another crash, the "Fatal error 6" variety. Windows
>>>>>> Defender has not turned itself back on (it's been known to do
>>>>>> that), and
>>>>>> a scan of the BLODA list didn't match anything else on my system.
>>>>>>
>>>>>> So I don't think it's BLODA...
>>>>>>
>>>>>> Ideas?
>>>>>
>>>>> Not really, other than the obvious: (a) Find a reproducible way of
>>>>> making emacs-nox crash.  (b) Catch the crash in gdb by setting a
>>>>> suitable break point.
> Got one! Looks like a stack overflow somewhere in the garbage collector:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 5316.0x1af4]
> 0x00000001004df44a in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
> 5903            if (CONS_MARKED_P (ptr))
> (gdb) bt
> #0  0x00000001004df44a in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
> #1  0x00000001004df66e in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
> #2  0x00000001004df593 in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5809
> #3  0x00000001004df66e in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
> #4  0x00000001004df66e in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
> #5  0x00000001004df585 in mark_object (arg=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5808
> #6  0x00000001004dfa4e in mark_vectorlike (
>      ptr=0x100f66f28 <bss_sbrk_buffer+6955080>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5501
> ... snip ...
> #2606 0x00000001004dfaf4 in mark_buffer (buffer=<optimized out>)
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5552
> #2607 0x00000001004dff2c in Fgarbage_collect ()
>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5181
> #2608 0x0000000000000000 in ?? ()

I don't know whether 2608 stack frames is unusual or not.  Is this 
enough to cause a stack overflow?

> I have the full backtrace saved to file, let me know if that would be
> useful (there wasn't anything obvious that I could see, just more of the
> same). Meanwhile, I verified that none of the addresses printed is
> repeated, so it doesn't seem to be due to an obvious cycle in the object
> graph.

 From what you've shown, it appears that most of the addresses have been 
optimized out.  I think you would need an unoptimized build in order to 
check that, wouldn't you?

> The crash happened when I foregrounded a stopped emacs. I tried playing
> around with various breakpoints while repeatedly sending ^Z, but no luck
> repeating the "feat" yet.
>
> Ideas?

Can you trigger the bug by calling garbage collection manually (M-x 
garbage-collect)?  What happens if you put a breakpoint at 
Fgarbage_collect and step through it?  (Again, you might need an 
unoptimized build before that will be useful.)

There are lots of lisp variables that can be used to control garbage 
collection and get information about it.  See the section on garbage 
collection in the elisp manual.  For example, you could try customizing 
garbage-collection-messages.  Or you could play with gc-cons-threshold.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple