This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fix gdbserver non-stop mode interrupt support, in the presence of internal stops, and pending statutes


This one had me stare at gdbserver debug logs of a while...

There are couple of issues with the vCont;t / "(gdb) interrupt" support
in linux gdbserver.  First, the vCont;t handling does:

  if (lwp->resume->kind == resume_stop)
    {
      if (!lwp->stopped)
	{
... <queue SIGSTOP> ...
	}
      else
	{
	  if (debug_threads)
	    fprintf (stderr, "already stopped LWP %ld\n",
		     lwpid_of (lwp));

	  /* The LWP may have been stopped in an internal event that
	     was not meant to be notified back to GDB (e.g., gdbserver
	     breakpoint), so we should be reporting a stop event in
	     this case too.  */

	  /* If the thread already has a pending SIGSTOP, this is a
	     no-op.  Otherwise, something later will presumably resume
	     the thread and this will cause it to cancel any pending
	     operation, due to last_resume_kind == resume_stop.  If
	     the thread already has a pending status to report, we
	     will still report it the next time we wait - see
	     status_pending_p_callback.  */
	  send_sigstop (lwp);
              ^^^^^^^^^^^^
	}

The intention was good, but, send_sigstop actually
does nothing, if lwp->stopped is set, so that never actually
worked correctly.

The second problem is that, if between GDB requesting a stop (and queuing
a SIGSTOP) with "vCont;t", and actually collecting the stop with
the normal linux_wait path and reporting it to GDB, some other LWP hits
an internal breakpoint, such as a tracepoint, we'll start a
step-over-breakpoint dance for that other LWP, and that involves momentarily
pausing all threads but the stepping one.  That pause-all-threads
consumes the pending SIGSTOP we had queued for GDB's vCont;t request,
and, after finishing the step-over, we'd end up leaving the LWP stopped,
but without reportin the stop to GDB, leaving GDB out of sync, thinking
the LWP was still running.

My simplified way to test this, is to have a few threads (3 for example), each
running in a tight loop, which a tracepoint set in each of them, in the loop. 
This forces step-over-breakpoint operations for all the threads in turn,
pausing and unpausing threads behind GDBs back.  Then, from GDB, I issue a
series of:

 (gdb) continue -a&
 (gdb) interrupt -a

(I actually put that in a user defined command, and put a weight on
<enter> while I went for dinner)

for a while, I see all 3 threads contantly reporting stops

 (gdb) interrupt -a
 ...
 [Thread 14066] #1 stopped.
 ...
 [Thread 14070] #2 stopped.
 ...
 [Thread 14071] #3 stopped.
 ...
 (gdb) c -a&

but eventually, the stars align in the right way, and one or two threads
start forgetting to report stops, and get stuck like so, in the
"running" state from GDB's perspective:

(gdb) info threads
  3 Thread 14071  0x000000000040076a in thread_function1 (arg=0x1) at threads.c:81
  2 Thread 14070  (running)
* 1 Thread 14066  0x00007fbc953e9cfd in pthread_join () from /lib/libpthread.so.0


No ammount of "interrupt -a" or "continue -a" gets the thread
state unstuck.

I've applied this patch to fix these issues.

I considered writing a test for this, but since it
involved non-stop, tracing, and is not something that
is that easy to trigger, I gave up.

-- 
Pedro Alves

2010-04-30  Pedro Alves  <pedro@codesourcery.com>

	gdb/gdbserver/
	* linux-low.c (linux_kill_one_lwp, linux_kill)
	(linux_detach_one_lwp): Adjust to send_sigstop interface change.
	(send_sigstop): Take an lwp_info as parameter instead.  Queue a
	SIGSTOP even if the LWP is stopped.
	(send_sigstop_callback): New.
	(stop_all_lwps): Use send_sigstop_callback instead.
	(linux_resume_one_thread): Adjust.
	(proceed_one_lwp): Still proceed an LWP that the client has
	requested to stop, if we haven't reported it as stopped yet.  Make
	sure that LWPs the client want stopped, have a pending SIGSTOP.

---
 gdb/gdbserver/linux-low.c |   58 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 14 deletions(-)

Index: src/gdb/gdbserver/linux-low.c
===================================================================
--- src.orig/gdb/gdbserver/linux-low.c	2010-04-30 21:42:34.000000000 +0100
+++ src/gdb/gdbserver/linux-low.c	2010-04-30 22:28:18.000000000 +0100
@@ -192,7 +192,7 @@ static int linux_event_pipe[2] = { -1, -
 /* True if we're currently in async mode.  */
 #define target_is_async_p() (linux_event_pipe[0] != -1)
 
-static void send_sigstop (struct inferior_list_entry *entry);
+static void send_sigstop (struct lwp_info *lwp);
 static void wait_for_sigstop (struct inferior_list_entry *entry);
 
 /* Accepts an integer PID; Returns a string representing a file that
@@ -741,7 +741,7 @@ linux_kill_one_lwp (struct inferior_list
   /* If we're killing a running inferior, make sure it is stopped
      first, as PTRACE_KILL will not work otherwise.  */
   if (!lwp->stopped)
-    send_sigstop (&lwp->head);
+    send_sigstop (lwp);
 
   do
     {
@@ -781,7 +781,7 @@ linux_kill (int pid)
   /* If we're killing a running inferior, make sure it is stopped
      first, as PTRACE_KILL will not work otherwise.  */
   if (!lwp->stopped)
-    send_sigstop (&lwp->head);
+    send_sigstop (lwp);
 
   do
     {
@@ -814,7 +814,7 @@ linux_detach_one_lwp (struct inferior_li
       int lwpid = lwpid_of (lwp);
 
       stopping_threads = 1;
-      send_sigstop (&lwp->head);
+      send_sigstop (lwp);
 
       /* If this detects a new thread through a clone event, the new
 	 thread is appended to the end of the lwp list, so we'll
@@ -2020,14 +2020,10 @@ kill_lwp (unsigned long lwpid, int signo
 }
 
 static void
-send_sigstop (struct inferior_list_entry *entry)
+send_sigstop (struct lwp_info *lwp)
 {
-  struct lwp_info *lwp = (struct lwp_info *) entry;
   int pid;
 
-  if (lwp->stopped)
-    return;
-
   pid = lwpid_of (lwp);
 
   /* If we already have a pending stop signal for this process, don't
@@ -2048,6 +2044,17 @@ send_sigstop (struct inferior_list_entry
 }
 
 static void
+send_sigstop_callback (struct inferior_list_entry *entry)
+{
+  struct lwp_info *lwp = (struct lwp_info *) entry;
+
+  if (lwp->stopped)
+    return;
+
+  send_sigstop (lwp);
+}
+
+static void
 mark_lwp_dead (struct lwp_info *lwp, int wstat)
 {
   /* It's dead, really.  */
@@ -2159,7 +2166,7 @@ static void
 stop_all_lwps (void)
 {
   stopping_threads = 1;
-  for_each_inferior (&all_lwps, send_sigstop);
+  for_each_inferior (&all_lwps, send_sigstop_callback);
   for_each_inferior (&all_lwps, wait_for_sigstop);
   stopping_threads = 0;
 }
@@ -2661,7 +2668,7 @@ linux_resume_one_thread (struct inferior
 
 	  /* Stop the thread, and wait for the event asynchronously,
 	     through the event loop.  */
-	  send_sigstop (&lwp->head);
+	  send_sigstop (lwp);
 	}
       else
 	{
@@ -2681,7 +2688,7 @@ linux_resume_one_thread (struct inferior
 	     the thread already has a pending status to report, we
 	     will still report it the next time we wait - see
 	     status_pending_p_callback.  */
-	  send_sigstop (&lwp->head);
+	  send_sigstop (lwp);
 	}
 
       /* For stop requests, we're done.  */
@@ -2822,10 +2829,12 @@ proceed_one_lwp (struct inferior_list_en
 
   thread = get_lwp_thread (lwp);
 
-  if (thread->last_resume_kind == resume_stop)
+  if (thread->last_resume_kind == resume_stop
+      && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
     {
       if (debug_threads)
-	fprintf (stderr, "   client wants LWP %ld stopped\n", lwpid_of (lwp));
+	fprintf (stderr, "   client wants LWP to remain %ld stopped\n",
+		 lwpid_of (lwp));
       return;
     }
 
@@ -2844,6 +2853,27 @@ proceed_one_lwp (struct inferior_list_en
       return;
     }
 
+  if (thread->last_resume_kind == resume_stop)
+    {
+      /* We haven't reported this LWP as stopped yet (otherwise, the
+	 last_status.kind check above would catch it, and we wouldn't
+	 reach here.  This LWP may have been momentarily paused by a
+	 stop_all_lwps call while handling for example, another LWP's
+	 step-over.  In that case, the pending expected SIGSTOP signal
+	 that was queued at vCont;t handling time will have already
+	 been consumed by wait_for_sigstop, and so we need to requeue
+	 another one here.  Note that if the LWP already has a SIGSTOP
+	 pending, this is a no-op.  */
+
+      if (debug_threads)
+	fprintf (stderr,
+		 "Client wants LWP %ld to stop. "
+		 "Making sure it has a SIGSTOP pending\n",
+		 lwpid_of (lwp));
+
+      send_sigstop (lwp);
+    }
+
   step = thread->last_resume_kind == resume_step;
   linux_resume_one_lwp (lwp, step, 0, NULL);
 }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]