This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Debugging problem in peek_pipe in select.cc


On Thu, Nov 08, 2001 at 10:55:42AM -0500, Jonathan Kamens wrote:
>I'm trying to debug why "make -j2" continues to hang for us
>occasionally even after cgf's recent fix to the code in this area.
>
>After deploying a cygwin1.dll with his fix, I ran two builds in a row
>which both hung.  I didn't get much useful information out of them, so
>I set things up to be able to debug better in case of future hangs,
>and then started running builds.
>
>I ran a whole bunch of builds over several days and none of them
>hung.  Finally, one of them hung, and then one of my coworkers killed
>and restarted it before I could debug it :-).
>
>Shortly after that, I finally got another build to hang, and I'm
>looking at that one now.  Here's the current roadblock preventing me
>from understanding what's going on....
>
>I attached to a hung process.  The top of its stack trace in thread 1
>looks like this:
>
>  #0  0x77f67a5b in ?? ()
>  #1  0x61053b08 in peek_pipe (s=0x24aeee4, ignra=0, guard_mutex=0x1dc)
>      at /u/jik/cygwin-cvs/src/winsup/cygwin/select.cc:453
>  #2  0x61053eba in fhandler_pipe::ready_for_read (this=0x61544920, fd=6, 
>      howlong=4294967295, ignra=0)
>      at /u/jik/cygwin-cvs/src/winsup/cygwin/select.cc:512
>  #3  0x61062b97 in _read (fd=6, ptr=0x24aeff2, len=1)
>      at /u/jik/cygwin-cvs/src/winsup/cygwin/syscalls.cc:315
>  #4  0x6108cbce in read (fd=6, buf=0x24aeff2, cnt=1)
>      at /u/jik/cygwin-cvs/src/newlib/libc/syscalls/sysread.c:15
>
>Line 453 of select.cc is a call to PeekNamedPipe.  According to the
>MSDN documentation for PeekNamedPipe, it never hangs.  So, thinking
>that frame 0 must be the PeekNamedPipe invocation, I typed "frame 0"
>and then "finish" in a "gdb -nw" window (running inside an ssh session
>to the Windows servers), and now it's hung.  How can that be?  I don't
>get it.

The point of my addition of a mutex to peek_pipe was to prevent occurrences
of PeekNamedPipe blocking, actually.  It can block in pathological situations
when another thread/process is doing a blocking read.  From your backtrace,
it looks like you are running an older version of the sources.  I have been
making a lot of changes to select to try to fix this problem.

One change in particular allowed me to run "make -j2" for more than 24 hours
with no hang.

I'm sorry that I didn't specifically send you email about this.

>^C has no effect at this point, so I can't get get to stop the process
>and tell me where it is now.

If cygwin is in a blocking win32 API call, then ^C will not work.  ready_for_read
is specifically designed to not block so that signals will work wrt
blocking reads.

If you are still seeing hangs in the most recent sources, then there is
still some kind of race with the guard mutex in peek_pipe.  That is
where you will need to investigate.

cgf


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]