This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Race condition in 19990922 snapshot [PATCH]


The appended patch fixes a race condition in select.cc.

This code creates a Win32 socket ("exitsock") which it uses to let the
master thread notify a slave thread that it should exit a particular
call to Winsock select().  (I know, technically, that threads don't
really have a master/slave relationship; but look at the code and you
will see what I mean.)

The problem here is twofold:

  1) Doing select() on something created by socket(), when that socket
     has not been connected our bound to anything, does not have
     defined behavior in any documentation I can find.  The code in
     select.cc seems to rely on some weird Winsock behavior; namely,
     that calling closesocket() on a newly-created socket will trigger
     an "exceptional condition" for any select() waiting on it.

  2) There is a race condition anyway.  It is possible for the master
     thread to call closesocket() before the slave thread even gets to
     the select().  The result is to select() on a value which is no
     longer a legitimate socket at all.  I have seen this race happen
     under strace.  The result is (usually) for the select() to return
     with an error code, which Cygwin pretty much ignores and
     continues plowing through.  Sometimes the result is a complete
     hang on our SMP box.  (I do not have details of the failures, but
     I suspect some kind of corruption in the internals of Winsock or
     Cygwin.)

The enclosed patch uses a socketpair() instead of a single socket.
The slave thread now selects for reading on one member of the pair;
the master can close the other end to notify the slave it should fall
out of select().  This fixes both of the problems: The behavior is now
well-defined, and the race condition is avoided.

After installing this patch, most of the Cygwin hangs we were
experiencing are now gone.  At least, I can't reproduce them in 20
seconds like I could before :-).

 - Pat

P.S.  You will get a compile-time warning because socketpair() is not
declared.  I was unable to find it in any .h file or I would have
#included it...


======================================================================

--- cygwin-src/winsup/select.cc	Wed Sep 22 23:57:31 1999
+++ cygwin-src-patl/winsup/select.cc	Fri Sep 24 15:59:42 1999
@@ -991,7 +991,7 @@ struct socketinf
   {
     HANDLE thread;
     winsock_fd_set readfds, writefds, exceptfds;
-    SOCKET exitsock;
+    int exitsockpair[2];
     select_record *start;
   };
 
@@ -1090,8 +1090,9 @@ thread_socket (void *arg)
 	    }
 	}
 
-  if (WINSOCK_FD_ISSET (si->exitsock, &si->exceptfds))
-    select_printf ("exitsock got an exception");
+  if (WINSOCK_FD_ISSET (dtable[si->exitsockpair[1]]->get_handle (),
+                        &si->readfds))
+    select_printf ("exitsockpair was closed");
   return 0;
 }
 
@@ -1133,15 +1134,16 @@ start_thread_socket (select_record *me, 
 	  }
       }
 
-  if ((si->exitsock = socket (PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+  if (socketpair (PF_INET, SOCK_STREAM, 0, si->exitsockpair) != 0)
     {
-      __seterrno ();
       select_printf ("cannot create socket");
       return -1;
     }
 
-  select_printf ("exitsock %p", si->exitsock);
-  WINSOCK_FD_SET ((HANDLE)si->exitsock, &si->exceptfds);
+  select_printf ("exitsocks %d %d", si->exitsockpair[0],
+                 si->exitsockpair[1]);
+  WINSOCK_FD_SET (dtable[si->exitsockpair[1]]->get_handle (),
+                  &si->readfds);
   stuff->device_specific[FHDEVN(FH_SOCKET)] = (void *) si;
   si->start = &stuff->start;
   select_printf ("stuff_start %p", &stuff->start);
@@ -1157,8 +1159,9 @@ socket_cleanup (select_record *me, selec
   select_printf ("si %p si->thread %p", si, si ? si->thread : NULL);
   if (si && si->thread)
     {
-      closesocket (si->exitsock);
+      _close (si->exitsockpair[0]);
       WaitForSingleObject (si->thread, INFINITE);
+      _close (si->exitsockpair[1]);
       CloseHandle (si->thread);
       stuff->device_specific[FHDEVN(FH_SOCKET)] = NULL;
       delete si;

--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe@sourceware.cygnus.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]