This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Fix gdbserver non-stop mode interrupt support, in the presence of internal stops, and pending statutes
- From: Pedro Alves <pedro at codesourcery dot com>
- To: gdb-patches at sourceware dot org
- Date: Fri, 30 Apr 2010 22:59:54 +0100
- Subject: Fix gdbserver non-stop mode interrupt support, in the presence of internal stops, and pending statutes
This one had me stare at gdbserver debug logs of a while...
There are couple of issues with the vCont;t / "(gdb) interrupt" support
in linux gdbserver. First, the vCont;t handling does:
if (lwp->resume->kind == resume_stop)
{
if (!lwp->stopped)
{
... <queue SIGSTOP> ...
}
else
{
if (debug_threads)
fprintf (stderr, "already stopped LWP %ld\n",
lwpid_of (lwp));
/* The LWP may have been stopped in an internal event that
was not meant to be notified back to GDB (e.g., gdbserver
breakpoint), so we should be reporting a stop event in
this case too. */
/* If the thread already has a pending SIGSTOP, this is a
no-op. Otherwise, something later will presumably resume
the thread and this will cause it to cancel any pending
operation, due to last_resume_kind == resume_stop. If
the thread already has a pending status to report, we
will still report it the next time we wait - see
status_pending_p_callback. */
send_sigstop (lwp);
^^^^^^^^^^^^
}
The intention was good, but, send_sigstop actually
does nothing, if lwp->stopped is set, so that never actually
worked correctly.
The second problem is that, if between GDB requesting a stop (and queuing
a SIGSTOP) with "vCont;t", and actually collecting the stop with
the normal linux_wait path and reporting it to GDB, some other LWP hits
an internal breakpoint, such as a tracepoint, we'll start a
step-over-breakpoint dance for that other LWP, and that involves momentarily
pausing all threads but the stepping one. That pause-all-threads
consumes the pending SIGSTOP we had queued for GDB's vCont;t request,
and, after finishing the step-over, we'd end up leaving the LWP stopped,
but without reportin the stop to GDB, leaving GDB out of sync, thinking
the LWP was still running.
My simplified way to test this, is to have a few threads (3 for example), each
running in a tight loop, which a tracepoint set in each of them, in the loop.
This forces step-over-breakpoint operations for all the threads in turn,
pausing and unpausing threads behind GDBs back. Then, from GDB, I issue a
series of:
(gdb) continue -a&
(gdb) interrupt -a
(I actually put that in a user defined command, and put a weight on
<enter> while I went for dinner)
for a while, I see all 3 threads contantly reporting stops
(gdb) interrupt -a
...
[Thread 14066] #1 stopped.
...
[Thread 14070] #2 stopped.
...
[Thread 14071] #3 stopped.
...
(gdb) c -a&
but eventually, the stars align in the right way, and one or two threads
start forgetting to report stops, and get stuck like so, in the
"running" state from GDB's perspective:
(gdb) info threads
3 Thread 14071 0x000000000040076a in thread_function1 (arg=0x1) at threads.c:81
2 Thread 14070 (running)
* 1 Thread 14066 0x00007fbc953e9cfd in pthread_join () from /lib/libpthread.so.0
No ammount of "interrupt -a" or "continue -a" gets the thread
state unstuck.
I've applied this patch to fix these issues.
I considered writing a test for this, but since it
involved non-stop, tracing, and is not something that
is that easy to trigger, I gave up.
--
Pedro Alves
2010-04-30 Pedro Alves <pedro@codesourcery.com>
gdb/gdbserver/
* linux-low.c (linux_kill_one_lwp, linux_kill)
(linux_detach_one_lwp): Adjust to send_sigstop interface change.
(send_sigstop): Take an lwp_info as parameter instead. Queue a
SIGSTOP even if the LWP is stopped.
(send_sigstop_callback): New.
(stop_all_lwps): Use send_sigstop_callback instead.
(linux_resume_one_thread): Adjust.
(proceed_one_lwp): Still proceed an LWP that the client has
requested to stop, if we haven't reported it as stopped yet. Make
sure that LWPs the client want stopped, have a pending SIGSTOP.
---
gdb/gdbserver/linux-low.c | 58 ++++++++++++++++++++++++++++++++++------------
1 file changed, 44 insertions(+), 14 deletions(-)
Index: src/gdb/gdbserver/linux-low.c
===================================================================
--- src.orig/gdb/gdbserver/linux-low.c 2010-04-30 21:42:34.000000000 +0100
+++ src/gdb/gdbserver/linux-low.c 2010-04-30 22:28:18.000000000 +0100
@@ -192,7 +192,7 @@ static int linux_event_pipe[2] = { -1, -
/* True if we're currently in async mode. */
#define target_is_async_p() (linux_event_pipe[0] != -1)
-static void send_sigstop (struct inferior_list_entry *entry);
+static void send_sigstop (struct lwp_info *lwp);
static void wait_for_sigstop (struct inferior_list_entry *entry);
/* Accepts an integer PID; Returns a string representing a file that
@@ -741,7 +741,7 @@ linux_kill_one_lwp (struct inferior_list
/* If we're killing a running inferior, make sure it is stopped
first, as PTRACE_KILL will not work otherwise. */
if (!lwp->stopped)
- send_sigstop (&lwp->head);
+ send_sigstop (lwp);
do
{
@@ -781,7 +781,7 @@ linux_kill (int pid)
/* If we're killing a running inferior, make sure it is stopped
first, as PTRACE_KILL will not work otherwise. */
if (!lwp->stopped)
- send_sigstop (&lwp->head);
+ send_sigstop (lwp);
do
{
@@ -814,7 +814,7 @@ linux_detach_one_lwp (struct inferior_li
int lwpid = lwpid_of (lwp);
stopping_threads = 1;
- send_sigstop (&lwp->head);
+ send_sigstop (lwp);
/* If this detects a new thread through a clone event, the new
thread is appended to the end of the lwp list, so we'll
@@ -2020,14 +2020,10 @@ kill_lwp (unsigned long lwpid, int signo
}
static void
-send_sigstop (struct inferior_list_entry *entry)
+send_sigstop (struct lwp_info *lwp)
{
- struct lwp_info *lwp = (struct lwp_info *) entry;
int pid;
- if (lwp->stopped)
- return;
-
pid = lwpid_of (lwp);
/* If we already have a pending stop signal for this process, don't
@@ -2048,6 +2044,17 @@ send_sigstop (struct inferior_list_entry
}
static void
+send_sigstop_callback (struct inferior_list_entry *entry)
+{
+ struct lwp_info *lwp = (struct lwp_info *) entry;
+
+ if (lwp->stopped)
+ return;
+
+ send_sigstop (lwp);
+}
+
+static void
mark_lwp_dead (struct lwp_info *lwp, int wstat)
{
/* It's dead, really. */
@@ -2159,7 +2166,7 @@ static void
stop_all_lwps (void)
{
stopping_threads = 1;
- for_each_inferior (&all_lwps, send_sigstop);
+ for_each_inferior (&all_lwps, send_sigstop_callback);
for_each_inferior (&all_lwps, wait_for_sigstop);
stopping_threads = 0;
}
@@ -2661,7 +2668,7 @@ linux_resume_one_thread (struct inferior
/* Stop the thread, and wait for the event asynchronously,
through the event loop. */
- send_sigstop (&lwp->head);
+ send_sigstop (lwp);
}
else
{
@@ -2681,7 +2688,7 @@ linux_resume_one_thread (struct inferior
the thread already has a pending status to report, we
will still report it the next time we wait - see
status_pending_p_callback. */
- send_sigstop (&lwp->head);
+ send_sigstop (lwp);
}
/* For stop requests, we're done. */
@@ -2822,10 +2829,12 @@ proceed_one_lwp (struct inferior_list_en
thread = get_lwp_thread (lwp);
- if (thread->last_resume_kind == resume_stop)
+ if (thread->last_resume_kind == resume_stop
+ && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
{
if (debug_threads)
- fprintf (stderr, " client wants LWP %ld stopped\n", lwpid_of (lwp));
+ fprintf (stderr, " client wants LWP to remain %ld stopped\n",
+ lwpid_of (lwp));
return;
}
@@ -2844,6 +2853,27 @@ proceed_one_lwp (struct inferior_list_en
return;
}
+ if (thread->last_resume_kind == resume_stop)
+ {
+ /* We haven't reported this LWP as stopped yet (otherwise, the
+ last_status.kind check above would catch it, and we wouldn't
+ reach here. This LWP may have been momentarily paused by a
+ stop_all_lwps call while handling for example, another LWP's
+ step-over. In that case, the pending expected SIGSTOP signal
+ that was queued at vCont;t handling time will have already
+ been consumed by wait_for_sigstop, and so we need to requeue
+ another one here. Note that if the LWP already has a SIGSTOP
+ pending, this is a no-op. */
+
+ if (debug_threads)
+ fprintf (stderr,
+ "Client wants LWP %ld to stop. "
+ "Making sure it has a SIGSTOP pending\n",
+ lwpid_of (lwp));
+
+ send_sigstop (lwp);
+ }
+
step = thread->last_resume_kind == resume_step;
linux_resume_one_lwp (lwp, step, 0, NULL);
}