This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]: Ugly thread step situation


Daniel Jacobowitz wrote:
On Tue, Sep 14, 2004 at 05:44:47PM -0400, jjohnstn wrote:

I recently tracked down a problem with gdb on RHEL3 Linux regarding stepping threads. What happens is that in some instances, lin-lwp.c is asked to step the thread of interest. We then wait on all threads. Due to some form of race condition, the wait does not get back the trap from the stepped thread. If we have a number of waiting events (e.g. thread create events, other breakpoints), lin-lwp picks one of them.


Could you explain this bit a little more?  What comes back instead for
the thread that was stepping?  Do we stop it with a SIGSTOP?

Is there a testcase?


Attached. This was the test-case given for Red Hat Bugzilla bug 130896.


Basically, you break at a thread function and attempt to do a mix of nexts and continues. It doesn't seem to occur deterministically but it will eventually occur.

The following is the excerpt of the lin-lwp trace that I put in the Bugzilla bug. It shows where the error occurs. We get an event on another thread, stop the stepping thread and see it stopped. At that point we have a choice of breakpoints to choose from.

The program needs to be run with an argument of 4

set debug lin-lwp 1
b synchronize
run 4
next
...

Breakpoint 1, synchronize (tid=3074096048) at gdbtest.C:18
18	    pthread_mutex_lock(&mutex);
(gdb) n
LLR: PTRACE_SINGLESTEP process 10066, 0 (resume event thread)
LLW: waitpid 10066 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 10066, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 10066.
SEL: Select single-step LWP 10066
LLW: trap_ptid is LWP 10066.
RC:  PTRACE_CONT LWP 10068, 0, 0 (resume sibling)
RC:  PTRACE_CONT LWP 10067, 0, 0 (resume sibling)
RC:  PTRACE_CONT LWP 10062, 0, 0 (resume sibling)
LLR: PTRACE_SINGLESTEP process 10066, 0 (resume event thread)
LLW: waitpid 10067 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 10067, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 10067.
SC:  kill LWP 10068 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SC:  kill LWP 10066 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SC:  kill LWP 10062 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
WL: waitpid LWP 10068 received Stopped (signal) (stopped)
WL: waitpid LWP 10066 received Trace/breakpoint trap (stopped)
PTRACE_CONT LWP 10066, 0, 0 (OK)
SWC: Candidate SIGTRAP event in LWP 10066
WL: waitpid LWP 10066 received Stopped (signal) (stopped)
WL: waitpid LWP 10062 received Trace/breakpoint trap (stopped)
PTRACE_CONT LWP 10062, 0, 0 (OK)
SWC: Candidate SIGTRAP event in LWP 10062
WL: waitpid LWP 10062 received Stopped (signal) (stopped)
FC: LP has pending status 00057f
FC: LP has pending status 00057f
SEL: Select single-step LWP 10066
CBC: Push back breakpoint for LWP 10067
CBC: Push back breakpoint for LWP 10062
LLW: trap_ptid is LWP 10066.
RC:  PTRACE_CONT LWP 10068, 0, 0 (resume sibling)
RC:  PTRACE_CONT LWP 10067, 0, 0 (resume sibling)
RC:  PTRACE_CONT LWP 10062, 0, 0 (resume sibling)
LLR: PTRACE_SINGLESTEP process 10066, 0 (resume event thread)
LLW: waitpid 10062 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 10062, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 10062.
SC:  kill LWP 10068 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SC:  kill LWP 10067 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SC:  kill LWP 10066 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
WL: waitpid LWP 10068 received Stopped (signal) (stopped)
WL: waitpid LWP 10067 received Trace/breakpoint trap (stopped)
PTRACE_CONT LWP 10067, 0, 0 (OK)
SWC: Candidate SIGTRAP event in LWP 10067
WL: waitpid LWP 10067 received Stopped (signal) (stopped)
WL: waitpid LWP 10066 received Stopped (signal) (stopped)
FC: LP has pending status 00057f
SEL: Found 2 SIGTRAP events, selecting #0  <=== should not happen
CBC: Push back breakpoint for LWP 10062
LLW: trap_ptid is LWP 10067.

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread -1231361104 (LWP 10067)]
0x080489cb in synchronize (tid=3063606192) at gdbtest.C:18
18	    pthread_mutex_lock(&mutex);



Now it gets interesting. Infrun.c thinks the current thread is being stepped and isn't ready for a breakpoint coming back. On x86, it makes a miscalculation of the pc value (for a breakpoint it should back up 1, for a step it doesn't have to). We end up pointing at an invalid pc (we didn't back up 1) and everything falls apart from there.

To fix this quickly, I added the accompanying patch to lin-lwp.c. What it does is ensure that we wait on any currently stepping lwp. In truth, this isn't as bad as it sounds. The lin-lwp code later on is set up to pick the stepping lwp over all other events. This just keeps the scenario above from occurring.

Obviously, this doesn't solve everything. Perhaps the decrement of the pc needs to be done once we have established whether the thread has changed underneath us. We also could use a hook to run the lwp list and find out if the current lwp was stepping or encountered a breakpoint.

Anyway, if the consensus is that the patch is helpful in the short-term, I am more than happy to check it in.

-- Jeff J.

2004-09-14 Jeff Johnston <jjohnstn@redhat.com>

	* lin-lwp.c (find_singlestep_lwp_callback): New static function.
	(lin_lwp_wait): Change code to specifically wait on any LWP
	that is currently stepping.


This sounds sort of like a problem I debugged on MIPS and hppa, but
never managed to reproduce.  I had tabled the patch until I had more
time to look at it - always a mistake.

The same patch may help here.  Could you tell me what resume_ptid is
before the call to target_resume, in resume?  The call in which we
request the single-step, I mean.


I'll have to look into it.


-- Jeff J.


#ifdef __sun
#include <stream.h>
#else
#include <iostream>
using namespace std;
#endif

#include <pthread.h>
#include <stdlib.h>
#include <sys/time.h>


pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

// getitimer, setitimer
void synchronize(pthread_t tid)
{
    pthread_mutex_lock(&mutex);
    cout << "In synchronize() for thread " << tid << endl;
    pthread_mutex_unlock(&mutex);
    return;
}

void pthreadMain()
{
    try
    {
	while ( 1 )
	{
	    pthread_t tid = pthread_self();
 	    //cout << "Thread " << tid << " about to call synchronize()..." << endl;
	    synchronize(tid);
 	    //cout << "Thread " << tid << " done..." << endl;
	}
    }
    catch ( ... )
    {
	cout << "--> EXCEPTION ???" << endl;
    }
}

int main(int argc, char **argv)
{
    int numThreads;

    if ( argc == 2 )
    {
	numThreads = atoi(argv[1]);
    }
    else
    {
	cout << "Usage: server <num threads>" << '\n';
	return(1);
    }

    pthread_t *tid = new pthread_t[numThreads];
    int idx;

    for ( idx = 1; idx < numThreads; idx++ )
    {
	if ( pthread_create(tid+idx, NULL,(void*(*)(void*))pthreadMain, NULL) != 0 )
	    cout << "--> FAILED TO CREATE THREAD!!!" << endl;
    }

    for ( idx = 1; idx < numThreads; idx++ )
	pthread_join(tid[idx], NULL);

    cout << "--> EXECUTION COMPLETED" << endl;
    return(0);
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]