This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Mysterious random crashes with latest snapshots


Hi,

I've been experiencing intermittent crashes on my Windows XP laptop with
the past few DLL versions (from 1.5.16 to the latest snapshot).  These are
extremely hard to reproduce, and happen seemingly at random, with various
applications (most often bash, but I've seen it happen with xargs, man,
etc).  The only correlation I noticed was that these seem to happen while
spawning child processes, and they occur more often with higher memory
usage.  I've only noticed the crashes of Cygwin applications --
everything else seems to run smoothly.

I know this isn't much of a bug report, but the details of what I've tried
are below (warning: long).

I was able to get a crash under the latest snapshot (CYGWIN_NT-5.1
1.5.18s(0.132/4/2) 20050624 09:33:10), with error_start set to gdb.
Here's part of the debugging session:

Attaching to program `/usr/bin/bash.exe', process 6140

[Switching to thread 6140.0x5bc]
(gdb) info threads
* 4 thread 6140.0x5bc  0x77f75a59 in ?? ()
   from /cygdrive/c/WINDOWS/System32/ntdll.dll
  3 thread 6140.0xf4c  0x7ffe0304 in ?? ()
  2 thread 6140.0x10f0  0x7ffe0304 in ?? ()
  1 thread 6140.0x398  0x610469d1 in fork ()
    at /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h:178
Current language:  auto; currently c++
(gdb) thread 1
[Switching to thread 1 (thread 6140.0x398)]#0  0x610469d1 in fork ()
    at /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h:178
178     /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h: No such file or directory.
        in /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h
(gdb) where
#0  0x610469d1 in fork ()
    at /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h:178
#1  0x6108439f in _sigfe ()
    at /netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/cygserver.h:82
#2  0x0022e334 in ?? ()
#3  0x00435d27 in fhandler_pipe::get_guard ()
#4  0x0000000f in ?? ()
#5  0x00080000 in ?? ()
#6  0x00080002 in ?? ()
#7  0xffffffff in ?? ()
#8  0x004e9e90 in ?? ()
#9  0x00000000 in ?? () from
(gdb) quit

Frames 2-9 look totally bogus, FWIW.  Here's a full stack dump produced by
(another invocation of) bash:

Exception: STATUS_ACCESS_VIOLATION at eip=610469D1
eax=42C36DE2 ebx=00000000 ecx=60030000 edx=00000000 esi=00000000 edi=00000000
ebp=0022E9D8 esp=0022E760 program=C:\cygwin\bin\bash.exe, pid 3112, thread main
cs=001B ds=0023 es=0023 fs=0038 gs=0000 ss=0023
Stack trace:
Frame     Function  Args
0022E9D8  610469D1  (000000FF, 0022E9F8, 0022E9F4, 00435D27)
0022EA08  6108439F  (004E6000, 00000000, 0022EA48, 00437F88)
0022EA58  00411344  (004EC9B8, 004EB088, 004E6480, FFFFFFFF)
0022EAC8  00410423  (004E9590, FFFFFFFF, FFFFFFFF, 00000000)
0022EB08  0040D8FF  (004E9578, 00000000, FFFFFFFF, FFFFFFFF)
0022EB38  0040D425  (004E9578, 00000001, 0022EB88, 0040DB05)
0022EB48  0040FB5B  (004E9560, 00000001, 00000001, 00000000)
0022EB88  0040DB05  (004E9548, 00000000, FFFFFFFF, FFFFFFFF)
0022EBB8  0040EA7E  (004E74B8, 00000000, FFFFFFFF, FFFFFFFF)
0022EBF8  0040DB42  (004E74B8, 00000000, FFFFFFFF, FFFFFFFF)
0022EC28  0040D425  (004E74B8, 004E5150, 0022EF88, 77E9B2E5)
0022EC58  0040EA48  (004E7488, 00000000, FFFFFFFF, FFFFFFFF)
0022EC98  0040DB42  (004E7488, 00000000, FFFFFFFF, FFFFFFFF)
0022ECD8  0040DBBB  (004E7470, 00000000, FFFFFFFF, FFFFFFFF)
0022ED08  00410D4F  (004E9690, 004E81F8, 00000008, 004E8318)
0022ED38  004111D9  (004E81F8, 00000000, 004E9690, 00000000)
End of stack trace (more stack frames may be present)

BTW, for some reason addr2line only decodes the first address:

$ awk '/^[0-9]/{print $2}' bash.exe.stackdump | addr2line -e /bin/cygwin1.dll
/netrel/src/cygwin-snapshot-20050624-1/winsup/cygwin/pinfo.h:178
??:0
??:0
...

though it seems that

0022EA08  6108439F  (004E6000, 00000000, 0022EA48, 00437F88)

should be decoded as well.  The relevant line of pinfo.h is

  _pinfo *operator -> () const {return procinfo;}

and, looking at the arguments, the pinfo object pointer seems to be
corrupted.

Unfortunately, I was unable to reproduce this under strace.

Since these crashes are very intermittent, any suggestions on further
debugging them?  I could compile a debugging version of the cygwin DLL
with extra information printed, but it'd help to know what information
would be useful.  Slightly obfuscated output of "cygcheck -svr" is
attached.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"The Sun will pass between the Earth and the Moon tonight for a total
Lunar eclipse..." -- WCBS Radio Newsbrief, Oct 27 2004, 12:01 pm EDT

Attachment: cygcheck-20050630.out
Description: Text document

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]