This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Instability with signals and threads


Hi

I have a program that sets a repetitive timer with setitimer and spawns 
several threads.

The program is very unstable on cygwin, it locks up in few minutes.

The bug manifests itself in the following way: the signal thread calls 
cygheap->find_tls to find a thread to deliver the signal to. find_tls 
generates an exception when scanning the threadlist, jumps to the __except 
block and calls threadlist[idx]->remove(INFINITE).

The method threadlist[idx]->remove is called with invalid "this" pointer 
(sometimes it is zero, sometimes it points to unmapped memory), generates 
another exception on "initialized = 0" line and becomes stuck on this 
assignment.

I found out that when I modify the remove_tls method so that it always 
acquires the lock and removes the thread from the threadlist (change 
"tls_sentry here(wait)" to "tls_sentry here(INFINITE)"), the bug goes away 
and the multithreaded program is stable.

Alternativelly - the crash can be fixed if we change "_my_tls.remove (0)" 
to "_my_tls.remove (INFINITE)" in thread_wrapper (though, there is another 
_my_tls.remove (0) call in dll_entry in winsup/cygwin/init.cc and it could 
trigger the same crash)


I'd like to ask - what's the reason for not waiting for the lock in 
remove_tls? If the lock is already locked, remove_tls does nothing - but 
the _cygtls structure is freed anyway, so that there is dangling pointer 
no the thread list. Do you think that we can drop this "wait" argument and 
always wait for the lock in remove_tls?



Another possible bug - when find_tls exits, it drops the tls_sentry lock 
and returns the pointer to _cygtls. What happens if the thread owning the 
tls exits at this point? It seems that there is nothing that prevents it 
from exiting and that the caller of find_tls (sigpacket::process) will 
work with a pointer to invalid thread. It seems that we need to add some 
reference count to _cygtls to prevent it from disappearing while we are 
trying to send a signal to it. (or keep tls_sentry::lock locked until 
sigpacket::process is done with the signal, though I don't know if keeping 
the lock for so long won't cause deadlocks).

Mikulas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]