This is the mail archive of the
mailing list for the Cygwin project.
Re: Problem with pthreads and signaling, behavior broken...
- From: "Arash Partow" <arashp at hotmail dot com>
- To: cygwin at cygwin dot com
- Date: Fri, 12 Dec 2003 02:43:03 +0000
- Subject: Re: Problem with pthreads and signaling, behavior broken...
Sorry about the late reply, now here is the best I can do with
the debug 101 skills.
The problem/issue/possible bug that "I" see when I run ThreadTest is the
following, I run ThreadTest and let it reach about 2k threads completed.
at that point i hit ctrl+c. I see that the main loop in the garbage
collector is broken out of then the next loop which only deletes completed
threads and does not renew them begins. At this point EVERYTHING slows
down. using dll from 19th nov, you can see that about 2-3 seconds after
you press ctrl+c the 2nd loop in the GC finishes off and the application
closes gracefully. however in the latest dlls including 11th Dec. This
is not the case.
What happens once ctrl+c is press?
Well its simple threads continue on running, obviously no new threads
are being created cause the GC is in the final clean up loop. but what
i do notice (through taskinfo) is that if i view the ThreadTest process in
flat view mode giving me a view of all the child threads the application
has created, there is "1" thread that is using 99% of the CPU,
and no other processes including normal windows processes get any
CPU time. Using cygwin1.dll from 19th Nov this behavior does not occur.
I saw your code was doing something which made the threadtest work now
with the new dll, I went through my original version of thread test and
found that if i add the following lines to the inside of the cleapup loop
(2nd loop) in the GC, it works with the current snapshot dlls.
new 2nd loop:
int countloops = 0; <----------- new line of code
/* clean up any remaining threads */
while(threadList.size() > 0)
vector <int> delPos;
for(unsigned int i=0; i < threadList.size(); i++)
if (threadList[i]->getThreadState() == THREAD_DEAD)
/* Recalibrate deletion positions */
for (unsigned int i=1; i < delPos.size(); i++)
for (unsigned int i=0; i < delPos.size(); i++)
/* Erase thread and free-up memory */
if ((++countloops) % 1000 == 0) <-----------|
usleep(10000); <-----------| new lines of code
it seems that by putting in a delay in the loop, the problem is
fixed, why is that ? and why isn't the problem seen with the snapshot
from 19th Nov?
why do other *nix systems not show this problem, if it is really a
problem of 1 thread hogging the cpu which may be interpreted
from the need to add a delay.
I would like to get rid of this delay, but it seems the only way the
threadtest will work properly is if the delay is there.
PS: can you please send me the binary you have built
using the original ThreadTest code?
Be one who knows what they don't know,
Instead of being one who knows not what they don't know,
Thinking they know everything about all things.
Btw, if someone can figure out precisely what they think is going wrong
with the ostensible problems that people are reporting, I'll be happy to
track this down.
I don't mean vagueness like "I think the signal isn't going to the
thread". I mean something concrete like "The signal handler is not
returning correctly to the calling thread, because..." or "The
keepRunning variable is not getting set".
It seems like if someone is seeing problems they could be sending some
debugging output to a file or something to help track the problems down.
You know? Debugging 101?
I always try to fix problems that I can duplicate but I'm not going to
spend an inordinate amount of time trying to fix problems that I can't
duplicate in something that is so clearly a corner case. I can see that
ThreadTest is supposed to be creating a bunch of threads but, beyond
that, I'm not interested in learning its intricacies.
I just tried this on one more system and it got a SIGSEGV no matter what
version of the Cygwin DLL I used. The difference with this system is
that it is multi-processor (it's also running XP, FWIW). From the gdb
stack trace, it seems like the problem may be because some operation
which manipulates a global list is not properly thread safe but, like I
said, I really don't want to delve into the mysteries of this program so
I didn't debug it very thoroughly.
E-mail just got a whole lot better. New ninemsn Premium. Click here
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html