showstopper bugs (boring technical details -- run away! run away!)

Christopher Faylor cgf@redhat.com
Mon Nov 6 08:33:00 GMT 2000


On Mon, Nov 06, 2000 at 09:55:30AM -0500, Town, Brad wrote:
>Chris Faylor wrote:
>>I've had a couple of show stopper bugs reported to me which, of course,
>>I can't duplicate, so I've held off on the release until I can either
>>duplicate and fix them or someone else can fix them (hah).
>
>Arrgh! There's that "hah" again! :)
>
>Would it be possible for you to briefly recap the show-stopper bugs?
>I'll help if I can.

Wow.  I've really stumbled onto something with the (hah).

The showstopper bugs were (I'm using the past tense because I am such an
incurable optimist) random errors from wait_subproc when logging in via
ssh.  Corinna reported them and since they were indicative of a serious
problem in cygwin, I've been trying to track them down "in my spare
time" (I'm supposed to be doing more managing and less programming).

I duplicated the problems last night at around 9PM and checked in a fix
at around 1AM.  As I was triumphantly drifting off to sleep, I realized
that some of my fix was questionable, so I have to redo it today.

The problem was due to the way cygwin handles the 'exec' call.  Since
Windows has nothing that says "start a new process and give it the same
pid", we have to kludge around this.  So, when a program exec's, a stub
sticks around waiting for an event from the newly "execed" process.  When
it gets the event, the stub opens the parent process with OpenProcess,
duplicates a handle to the newly execed process into its parent, and then
exits.  The parent notices the exit, discovers that there is a new handle,
for its child, does some bookkeeping and goes back to waiting for children
to exit.

The problem was that the process of contacting the parent was not 100%
reliable.  I don't know why this is now the case, but I worked around the
problem by always passing a handle to the parent process to all of the
children.  This is something that I've wanted to do for a while anyway.

In the process of fixing this bug, I stumbled across several other *#$!
signal races which I worked around.  Today, after a fresh night's sleep,
I believe that I know how to fix them.

Anyway, thanks for the offer.  If you want to look at the code in question,
it's in sigproc.cc (wait_subproc) and spawn.cc (spawn_guts).  This is not
for the faint of heart.  I keep meaning to add more comments and document
the whole sorry mess but I've never gotten around to it.

By the way, I now need to do some laundry unless someone else gets around
to it (hah).

cgf

--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe@sourceware.cygnus.com



More information about the Cygwin mailing list