This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: 1.5.24 (and later): race condition in sigproc.cc


Unfortunately the NO_COPY change doesn't appear to solve the problem.  

The semantics of DllMain are that it acts like a monitor so only one
thread can be inside the DllMain routine at a time.  This does not
guarantee that other threads are completely inactive.  In particular,
other threads could be reading the in_dllentry variable while DllMain is
being invoked for a newly created thread.  I think the ultimate solution
has to be a mutex (spinlock) around any critical initializations as well
as any accesses to those data structures in other threads.

--Scott

-----Original Message-----

On Fri, Jul 13, 2007 at 05:09:24PM -0700, Scott Stanton wrote:
>I have found what I believe to be a race condition in sigproc.cc and
>exceptions.cc.  The problem is that any access to the in_dllentry
>variable defined in init.cc is vulnerable to a race condition when a
new
>thread is being initialized.
>
>The initial symptom is that under heavy load on a multiprocessor
machine
>cygwin processes intermittently fail with a "fork: Resource temporarily
>unavailable" error.  I tracked this to the sig_send() calls inside
>fork().  These calls were failing, causing fork to return EAGAIN.  The
>sig_send() call was failing on the first no_signals_available() test.
>After expanding the macro to see which arm of the test was failing, it
>turns out that sig_send() was seeing a non-zero value for in_dllentry.
>This boolean is set during the call to dll_entry() whenever a process
or
>thread attaches to the cygwin dll.  Because the in_dllentry variable is
>checked without holding a mutex, threads calling sig_send() can
>temporarily see the value as true when a new thread is starting.  If
the
>sig_send() code is modified to retry after yielding the processor, the
>second attempt succeeds.

When a process or thread attaches via dll_entry it is supposed to be
single threaded at that point so another thread isn't supposed to be
able to see that variable as true.  So, what you may be seeing is that
in_dllentry is being duplicated by fork and that is causing a problem.
The fix for that is simple.  I've checked it in.

If in_dllentry is really still getting set and other threads can see
that
then much more work will need to be done since that violates a lot of
assumptions about what goes on via process/thread attach/detach.

cgf


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]