This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: nptl threading patch for linux


Thanks Michael.  Patch checked in.  I deleted the ChangeLog entry
for thread-db.c which was already made into a separate patch.

-- Jeff J.

Michael Snyder wrote:
After reviewing the on-list discussion, it sounds like this patch
should go in.  Jeff, would you do the honors?

Michael

"J. Johnston" wrote:

The following is the last part of my revised nptl patch that has
been broken up per Daniel J.'s suggestion.  There are no generated
files included in the patch.

A bit of background is needed.  First of all, in the nptl model,
lwps and pids are distinct entities.  If you perform a ps on a
multithreaded application, you will not see its child threads show
up.  This differs from the linuxthreads model whereby an lwp was
actually a pid.  In the nptl model, you cannot issue a kill to
a child thread via its lwp.  If you want to do this, you need to use
the tkill syscall instead.  The action of sending a signal to a specific
lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
determine at configuration time if we have the tkill syscall and then at run-time,
if we can use it.  A new function kill_lwp has been added which either calls the
tkill syscall or the regular kill function as appropriate.

Another key difference in behavior between nptl and linuxthreads is what happens
when a thread exits.  In the linuxthreads model, each thread that exits causes
an exit event to occur.  The main thread can exit before all of its children.
In the nptl model, only the main thread generates the exit event and it only
does so after all child threads have exited.  So, to determine when an lwp
has exited, we have to constantly check its status when we are able.  When
we get an exit event we have to determine if we are exiting the program or
just one of the threads.  Some additional logic has been added to the exit
checking code such that if we have (lwp == pid), then we stop all active
threads and check if they have already exited.  If they have not exited,
we resume these threads, otherwise, we delete them.

Daniel J. brought up a good point regarding my previous attempt at this logic
whereby I stopped all threads and resumed all threads.  That old logic is wrong
when scheduler_locking is set on.  The new logic addresses this because
it only stops/resumes threads which are not already stopped.  Currently, if we lock
into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
because we will field the exit event and end up with no running threads.  Under nptl, we
never get notified of the thread exiting so for this scenario, we end up hung.
In this instance, there is nothing we can do without a kernel change.  Daniel J.
is looking into a kernel change.  For user threads, we could activate the death event
in thread-db.c and make a low-level call to notify the lwp layer, but this does not
handle lwps created directly by the end-user.  This issue needs to be discussed
further outside of this patch because I want to propose changing the assertion to
allow the user to resume the other threads.  I have tested the change with such a
scenario (locking a thread and running until exit).  For linuxthreads we trigger the
assertion as before.  For nptl, we hang as expected as we never get the exit event.
A bug can be opened for this if the patch is accepted.

The last piece of logic has to do with a new interface needed by the nptl libthread_db
library to get the thread area.  I have grouped this together with the lin-lwp changes
because the lin-lwp changes cannot work without this crucial change.

This patch has been tested for both linuxthreads and with an nptl kernel.

Ok to commit?

-- Jeff J.

2003-04-24 Jeff Johnston <jjohnstn@redhat.com>

       * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
       * config.in: Regenerated.
       * configure.in: Add test for syscall function and check for
       __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
       * configure: Regenerated.
       * thread-db.c (check_event): For create/death event breakpoints,
       loop through all messages to ensure that we read the message
       corresponding to the breakpoint we are at.
       * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
       <sys/syscall.h>.
       (kill_lwp): New function that uses tkill syscall or
       uses kill, depending on whether threading model is nptl or not.
       All callers of kill() changed to use kill_lwp().
       (lin_lwp_wait): Make special check when WIFEXITED occurs to
       see if all threads have already exited in the nptl model.
       (stop_and_resume_callback): New callback function used by the
       lin_lwp_wait thread exit handling code.
       (stop_wait_callback): Check for threads already having exited and
       delete such threads fromt the lwp list when discovered.
       (stop_callback): Don't assert retcode of kill call.

       Roland McGrath  <roland@redhat.com>
       * i386-linux-nat.c (ps_get_thread_area): New function needed by
       nptl libthread_db.

 -------------------------------------------------------------------------------
Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.43
diff -u -r1.43 lin-lwp.c
--- lin-lwp.c   28 Mar 2003 21:42:41 -0000      1.43
+++ lin-lwp.c   23 Apr 2003 22:52:44 -0000
@@ -24,6 +24,10 @@
#include "gdb_string.h"
#include <errno.h>
#include <signal.h>
+#ifdef HAVE_TKILL_SYSCALL
+#include <unistd.h>
+#include <sys/syscall.h>
+#endif
#include <sys/ptrace.h>
#include "gdb_wait.h"

@@ -156,6 +160,7 @@

/* Prototypes for local functions.  */
static int stop_wait_callback (struct lwp_info *lp, void *data);
+static int lin_lwp_thread_alive (ptid_t ptid);

/* Convert wait status STATUS to a string.  Used for printing debug
   messages only.  */
@@ -627,6 +632,32 @@
}


+/* Issue kill to specified lwp. */ + +static int tkill_failed; + +static int +kill_lwp (int lwpid, int signo) +{ + errno = 0; + +/* Use tkill, if possible, in case we are using nptl threads. If tkill + fails, then we are not using nptl threads and we should be using kill. */ + +#ifdef HAVE_TKILL_SYSCALL + if (!tkill_failed) + { + int ret = syscall (__NR_tkill, lwpid, signo); + if (errno != ENOSYS) + return ret; + errno = 0; + tkill_failed = 1; + } +#endif + + return kill (lwpid, signo); +} + /* Send a SIGSTOP to LP. */

static int
@@ -642,8 +673,15 @@
                             "SC:  kill %s **<SIGSTOP>**\n",
                             target_pid_to_str (lp->ptid));
       }
-      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
-      gdb_assert (ret == 0);
+      errno = 0;
+      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
+      if (debug_lin_lwp)
+       {
+         fprintf_unfiltered (gdb_stdlog,
+                             "SC:  lwp kill %d %s\n",
+                             ret,
+                             errno ? safe_strerror (errno) : "ERRNO-OK");
+       }

      lp->signalled = 1;
      gdb_assert (lp->status == 0);
@@ -667,11 +705,23 @@

gdb_assert (lp->status == 0);

-      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
+      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
      if (pid == -1 && errno == ECHILD)
-       /* OK, the proccess has disappeared.  We'll catch the actual
-          exit event in lin_lwp_wait.  */
-       return 0;
+       {
+         pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
+         if (pid == -1 && errno == ECHILD)
+           {
+             /* The thread has previously exited.  We need to delete it now
+                because in the case of nptl threads, there won't be an
+                exit event unless it is the main thread.  */
+             if (debug_lin_lwp)
+               fprintf_unfiltered (gdb_stdlog,
+                                   "SWC: %s exited.\n",
+                                   target_pid_to_str (lp->ptid));
+             delete_lwp (lp->ptid);
+             return 0;
+           }
+       }

gdb_assert (pid == GET_LWP (lp->ptid));

@@ -683,6 +733,7 @@
                             status_to_str (status));
       }

+      /* Check if the thread has exited.  */
      if (WIFEXITED (status) || WIFSIGNALED (status))
       {
         gdb_assert (num_lwps > 1);
@@ -697,7 +748,31 @@
                                target_pid_to_str (lp->ptid));
           }
         if (debug_lin_lwp)
-           fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
+           fprintf_unfiltered (gdb_stdlog,
+                               "SWC: %s exited.\n",
+                               target_pid_to_str (lp->ptid));
+
+         delete_lwp (lp->ptid);
+         return 0;
+       }
+
+      /* Check if the current LWP has previously exited.  For nptl threads,
+         there is no exit signal issued for LWPs that are not the
+         main thread so we should check whenever the thread is stopped.  */
+      if (!lin_lwp_thread_alive (lp->ptid))
+       {
+         if (in_thread_list (lp->ptid))
+           {
+             /* Core GDB cannot deal with us deleting the current
+                thread.  */
+             if (!ptid_equal (lp->ptid, inferior_ptid))
+               delete_thread (lp->ptid);
+             printf_unfiltered ("[%s exited]\n",
+                                target_pid_to_str (lp->ptid));
+           }
+         if (debug_lin_lwp)
+           fprintf_unfiltered (gdb_stdlog,
+                               "SWC: %s already exited.\n",
                               target_pid_to_str (lp->ptid));

         delete_lwp (lp->ptid);
@@ -756,7 +831,14 @@
             /* If there's another event, throw it back into the queue. */
             if (lp->status)
               {
-                 kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
+                 if (debug_lin_lwp)
+                   {
+                     fprintf_unfiltered (gdb_stdlog,
+                                         "SWC: kill %s, %s\n",
+                                         target_pid_to_str (lp->ptid),
+                                         status_to_str ((int) status));
+                   }
+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
               }
             /* Save the sigtrap event. */
             lp->status = status;
@@ -800,7 +882,7 @@
                                         target_pid_to_str (lp->ptid),
                                         status_to_str ((int) status));
                   }
-                 kill (GET_LWP (lp->ptid), WSTOPSIG (status));
+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
               }
             return 0;
           }
@@ -1049,6 +1131,25 @@

#endif

+/* Stop an active thread, verify it still exists, then resume it.  */
+
+static int
+stop_and_resume_callback (struct lwp_info *lp, void *data)
+{
+  struct lwp_info *ptr;
+
+  if (!lp->stopped && !lp->signalled)
+    {
+      stop_callback (lp, NULL);
+      stop_wait_callback (lp, NULL);
+      /* Resume if the lwp still exists.  */
+      for (ptr = lwp_list; ptr; ptr = ptr->next)
+       if (lp == ptr)
+         resume_callback (lp, NULL);
+    }
+  return 0;
+}
+
static ptid_t
lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
{
@@ -1206,10 +1307,61 @@
               }
           }

-         /* Make sure we don't report a TARGET_WAITKIND_EXITED or
-            TARGET_WAITKIND_SIGNALLED event if there are still LWP's
-            left in the process.  */
+         /* Check if the thread has exited.  */
         if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
+           {
+             if (in_thread_list (lp->ptid))
+               {
+                 /* Core GDB cannot deal with us deleting the current
+                    thread.  */
+                 if (!ptid_equal (lp->ptid, inferior_ptid))
+                   delete_thread (lp->ptid);
+                 printf_unfiltered ("[%s exited]\n",
+                                    target_pid_to_str (lp->ptid));
+               }
+
+             /* If this is the main thread, we must stop all threads and
+                verify if they are still alive.  This is because in the nptl
+                thread model, there is no signal issued for exiting LWPs
+                other than the main thread.  We only get the main thread
+                exit signal once all child threads have already exited.
+                If we stop all the threads and use the stop_wait_callback
+                to check if they have exited we can determine whether this
+                signal should be ignored or whether it means the end of the
+                debugged application, regardless of which threading model
+                is being used.  */
+             if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
+               {
+                 lp->stopped = 1;
+                 iterate_over_lwps (stop_and_resume_callback, NULL);
+               }
+
+             if (debug_lin_lwp)
+               fprintf_unfiltered (gdb_stdlog,
+                                   "LLW: %s exited.\n",
+                                   target_pid_to_str (lp->ptid));
+
+             delete_lwp (lp->ptid);
+
+             /* If there is at least one more LWP, then the exit signal
+                was not the end of the debugged application and should be
+                ignored.  */
+             if (num_lwps > 0)
+               {
+                 /* Make sure there is at least one thread running.  */
+                 gdb_assert (iterate_over_lwps (running_callback, NULL));
+
+                 /* Discard the event.  */
+                 status = 0;
+                 continue;
+               }
+           }
+
+         /* Check if the current LWP has previously exited.  In the nptl
+            thread model, LWPs other than the main thread do not issue
+            signals when they exit so we must check whenever the thread
+            has stopped.  A similar check is made in stop_wait_callback().  */
+         if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
           {
             if (in_thread_list (lp->ptid))
               {
Index: acconfig.h
===================================================================
RCS file: /cvs/src/src/gdb/acconfig.h,v
retrieving revision 1.24
diff -u -r1.24 acconfig.h
--- acconfig.h  4 Jan 2003 00:34:42 -0000       1.24
+++ acconfig.h  23 Apr 2003 22:52:44 -0000
@@ -95,6 +95,9 @@
/* Define if using Solaris thread debugging.  */
#undef HAVE_THREAD_DB_LIB

+/* Define if you support the tkill syscall.  */
+#undef HAVE_TKILL_SYSCALL
+
/* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
#undef START_INFERIOR_TRAPS_EXPECTED
#undef sys_quotactl
Index: configure.in
===================================================================
RCS file: /cvs/src/src/gdb/configure.in,v
retrieving revision 1.126
diff -u -r1.126 configure.in
--- configure.in        26 Feb 2003 15:10:47 -0000      1.126
+++ configure.in        23 Apr 2003 22:52:44 -0000
@@ -384,6 +384,7 @@
AC_CHECK_FUNCS(setpgid setpgrp)
AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
AC_CHECK_FUNCS(socketpair)
+AC_CHECK_FUNCS(syscall)

dnl AC_FUNC_SETPGRP does not work when cross compiling
dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
@@ -909,6 +910,24 @@
if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
  AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
            [Define if <thread_db.h> has the TD_NOTALLOC error code.])
+fi
+
+dnl See if we have a sys/syscall header file that has __NR_tkill.
+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
+   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
+                  gdb_cv_sys_syscall_h_has_tkill,
+     AC_TRY_COMPILE(
+       [#include <sys/syscall.h>],
+       [int i = __NR_tkill;],
+       gdb_cv_sys_syscall_h_has_tkill=yes,
+       gdb_cv_sys_syscall_h_has_tkill=no
+     )
+   )
+fi
+dnl See if we can issue tkill syscall.
+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
+  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
+            [Define if we can use the tkill syscall.])
fi

dnl Handle optional features that can be enabled.
Index: i386-linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
retrieving revision 1.44
diff -u -r1.44 i386-linux-nat.c
--- i386-linux-nat.c    16 Apr 2003 15:22:02 -0000      1.44
+++ i386-linux-nat.c    23 Apr 2003 22:52:44 -0000
@@ -70,6 +70,9 @@
/* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
#include "i386-linux-tdep.h"

+/* Defines ps_err_e, struct ps_prochandle.  */
+#include "gdb_proc_service.h"
+
/* Prototypes for local functions.  */
static void dummy_sse_values (void);

@@ -682,6 +685,21 @@
         offsetof (struct user, u_debugreg[regnum]), value);
  if (errno != 0)
    perror_with_name ("Couldn't write debug register");
+}
+
+extern ps_err_e
+ps_get_thread_area(const struct ps_prochandle *ph,
+                  lwpid_t lwpid, int idx, void **base)
+{
+  unsigned long int desc[3];
+#define PTRACE_GET_THREAD_AREA 25
+
+  if  (ptrace (PTRACE_GET_THREAD_AREA,
+              lwpid, (void *) idx, (unsigned long) &desc) < 0)
+    return PS_ERR;
+
+  *(int *)base = desc[1];
+  return PS_OK;
}

void





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]