This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: nptl threading patch for linux


After reviewing the on-list discussion, it sounds like this patch
should go in.  Jeff, would you do the honors?

Michael

"J. Johnston" wrote:
> 
> The following is the last part of my revised nptl patch that has
> been broken up per Daniel J.'s suggestion.  There are no generated
> files included in the patch.
> 
> A bit of background is needed.  First of all, in the nptl model,
> lwps and pids are distinct entities.  If you perform a ps on a
> multithreaded application, you will not see its child threads show
> up.  This differs from the linuxthreads model whereby an lwp was
> actually a pid.  In the nptl model, you cannot issue a kill to
> a child thread via its lwp.  If you want to do this, you need to use
> the tkill syscall instead.  The action of sending a signal to a specific
> lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
> determine at configuration time if we have the tkill syscall and then at run-time,
> if we can use it.  A new function kill_lwp has been added which either calls the
> tkill syscall or the regular kill function as appropriate.
> 
> Another key difference in behavior between nptl and linuxthreads is what happens
> when a thread exits.  In the linuxthreads model, each thread that exits causes
> an exit event to occur.  The main thread can exit before all of its children.
> In the nptl model, only the main thread generates the exit event and it only
> does so after all child threads have exited.  So, to determine when an lwp
> has exited, we have to constantly check its status when we are able.  When
> we get an exit event we have to determine if we are exiting the program or
> just one of the threads.  Some additional logic has been added to the exit
> checking code such that if we have (lwp == pid), then we stop all active
> threads and check if they have already exited.  If they have not exited,
> we resume these threads, otherwise, we delete them.
> 
> Daniel J. brought up a good point regarding my previous attempt at this logic
> whereby I stopped all threads and resumed all threads.  That old logic is wrong
> when scheduler_locking is set on.  The new logic addresses this because
> it only stops/resumes threads which are not already stopped.  Currently, if we lock
> into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
> because we will field the exit event and end up with no running threads.  Under nptl, we
> never get notified of the thread exiting so for this scenario, we end up hung.
> In this instance, there is nothing we can do without a kernel change.  Daniel J.
> is looking into a kernel change.  For user threads, we could activate the death event
> in thread-db.c and make a low-level call to notify the lwp layer, but this does not
> handle lwps created directly by the end-user.  This issue needs to be discussed
> further outside of this patch because I want to propose changing the assertion to
> allow the user to resume the other threads.  I have tested the change with such a
> scenario (locking a thread and running until exit).  For linuxthreads we trigger the
> assertion as before.  For nptl, we hang as expected as we never get the exit event.
> A bug can be opened for this if the patch is accepted.
> 
> The last piece of logic has to do with a new interface needed by the nptl libthread_db
> library to get the thread area.  I have grouped this together with the lin-lwp changes
> because the lin-lwp changes cannot work without this crucial change.
> 
> This patch has been tested for both linuxthreads and with an nptl kernel.
> 
> Ok to commit?
> 
> -- Jeff J.
> 
> 2003-04-24  Jeff Johnston  <jjohnstn@redhat.com>
> 
>         * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
>         * config.in: Regenerated.
>         * configure.in: Add test for syscall function and check for
>         __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
>         * configure: Regenerated.
>         * thread-db.c (check_event): For create/death event breakpoints,
>         loop through all messages to ensure that we read the message
>         corresponding to the breakpoint we are at.
>         * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
>         <sys/syscall.h>.
>         (kill_lwp): New function that uses tkill syscall or
>         uses kill, depending on whether threading model is nptl or not.
>         All callers of kill() changed to use kill_lwp().
>         (lin_lwp_wait): Make special check when WIFEXITED occurs to
>         see if all threads have already exited in the nptl model.
>         (stop_and_resume_callback): New callback function used by the
>         lin_lwp_wait thread exit handling code.
>         (stop_wait_callback): Check for threads already having exited and
>         delete such threads fromt the lwp list when discovered.
>         (stop_callback): Don't assert retcode of kill call.
> 
>         Roland McGrath  <roland@redhat.com>
>         * i386-linux-nat.c (ps_get_thread_area): New function needed by
>         nptl libthread_db.
> 
>   -------------------------------------------------------------------------------
> Index: lin-lwp.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/lin-lwp.c,v
> retrieving revision 1.43
> diff -u -r1.43 lin-lwp.c
> --- lin-lwp.c   28 Mar 2003 21:42:41 -0000      1.43
> +++ lin-lwp.c   23 Apr 2003 22:52:44 -0000
> @@ -24,6 +24,10 @@
>  #include "gdb_string.h"
>  #include <errno.h>
>  #include <signal.h>
> +#ifdef HAVE_TKILL_SYSCALL
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#endif
>  #include <sys/ptrace.h>
>  #include "gdb_wait.h"
> 
> @@ -156,6 +160,7 @@
> 
>  /* Prototypes for local functions.  */
>  static int stop_wait_callback (struct lwp_info *lp, void *data);
> +static int lin_lwp_thread_alive (ptid_t ptid);
> 
>  /* Convert wait status STATUS to a string.  Used for printing debug
>     messages only.  */
> @@ -627,6 +632,32 @@
>  }
> 
> 
> +/* Issue kill to specified lwp.  */
> +
> +static int tkill_failed;
> +
> +static int
> +kill_lwp (int lwpid, int signo)
> +{
> +  errno = 0;
> +
> +/* Use tkill, if possible, in case we are using nptl threads.  If tkill
> +   fails, then we are not using nptl threads and we should be using kill.  */
> +
> +#ifdef HAVE_TKILL_SYSCALL
> +  if (!tkill_failed)
> +    {
> +      int ret = syscall (__NR_tkill, lwpid, signo);
> +      if (errno != ENOSYS)
> +       return ret;
> +      errno = 0;
> +      tkill_failed = 1;
> +    }
> +#endif
> +
> +  return kill (lwpid, signo);
> +}
> +
>  /* Send a SIGSTOP to LP.  */
> 
>  static int
> @@ -642,8 +673,15 @@
>                               "SC:  kill %s **<SIGSTOP>**\n",
>                               target_pid_to_str (lp->ptid));
>         }
> -      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
> -      gdb_assert (ret == 0);
> +      errno = 0;
> +      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
> +      if (debug_lin_lwp)
> +       {
> +         fprintf_unfiltered (gdb_stdlog,
> +                             "SC:  lwp kill %d %s\n",
> +                             ret,
> +                             errno ? safe_strerror (errno) : "ERRNO-OK");
> +       }
> 
>        lp->signalled = 1;
>        gdb_assert (lp->status == 0);
> @@ -667,11 +705,23 @@
> 
>        gdb_assert (lp->status == 0);
> 
> -      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
> +      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
>        if (pid == -1 && errno == ECHILD)
> -       /* OK, the proccess has disappeared.  We'll catch the actual
> -          exit event in lin_lwp_wait.  */
> -       return 0;
> +       {
> +         pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
> +         if (pid == -1 && errno == ECHILD)
> +           {
> +             /* The thread has previously exited.  We need to delete it now
> +                because in the case of nptl threads, there won't be an
> +                exit event unless it is the main thread.  */
> +             if (debug_lin_lwp)
> +               fprintf_unfiltered (gdb_stdlog,
> +                                   "SWC: %s exited.\n",
> +                                   target_pid_to_str (lp->ptid));
> +             delete_lwp (lp->ptid);
> +             return 0;
> +           }
> +       }
> 
>        gdb_assert (pid == GET_LWP (lp->ptid));
> 
> @@ -683,6 +733,7 @@
>                               status_to_str (status));
>         }
> 
> +      /* Check if the thread has exited.  */
>        if (WIFEXITED (status) || WIFSIGNALED (status))
>         {
>           gdb_assert (num_lwps > 1);
> @@ -697,7 +748,31 @@
>                                  target_pid_to_str (lp->ptid));
>             }
>           if (debug_lin_lwp)
> -           fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
> +           fprintf_unfiltered (gdb_stdlog,
> +                               "SWC: %s exited.\n",
> +                               target_pid_to_str (lp->ptid));
> +
> +         delete_lwp (lp->ptid);
> +         return 0;
> +       }
> +
> +      /* Check if the current LWP has previously exited.  For nptl threads,
> +         there is no exit signal issued for LWPs that are not the
> +         main thread so we should check whenever the thread is stopped.  */
> +      if (!lin_lwp_thread_alive (lp->ptid))
> +       {
> +         if (in_thread_list (lp->ptid))
> +           {
> +             /* Core GDB cannot deal with us deleting the current
> +                thread.  */
> +             if (!ptid_equal (lp->ptid, inferior_ptid))
> +               delete_thread (lp->ptid);
> +             printf_unfiltered ("[%s exited]\n",
> +                                target_pid_to_str (lp->ptid));
> +           }
> +         if (debug_lin_lwp)
> +           fprintf_unfiltered (gdb_stdlog,
> +                               "SWC: %s already exited.\n",
>                                 target_pid_to_str (lp->ptid));
> 
>           delete_lwp (lp->ptid);
> @@ -756,7 +831,14 @@
>               /* If there's another event, throw it back into the queue. */
>               if (lp->status)
>                 {
> -                 kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
> +                 if (debug_lin_lwp)
> +                   {
> +                     fprintf_unfiltered (gdb_stdlog,
> +                                         "SWC: kill %s, %s\n",
> +                                         target_pid_to_str (lp->ptid),
> +                                         status_to_str ((int) status));
> +                   }
> +                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>                 }
>               /* Save the sigtrap event. */
>               lp->status = status;
> @@ -800,7 +882,7 @@
>                                           target_pid_to_str (lp->ptid),
>                                           status_to_str ((int) status));
>                     }
> -                 kill (GET_LWP (lp->ptid), WSTOPSIG (status));
> +                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
>                 }
>               return 0;
>             }
> @@ -1049,6 +1131,25 @@
> 
>  #endif
> 
> +/* Stop an active thread, verify it still exists, then resume it.  */
> +
> +static int
> +stop_and_resume_callback (struct lwp_info *lp, void *data)
> +{
> +  struct lwp_info *ptr;
> +
> +  if (!lp->stopped && !lp->signalled)
> +    {
> +      stop_callback (lp, NULL);
> +      stop_wait_callback (lp, NULL);
> +      /* Resume if the lwp still exists.  */
> +      for (ptr = lwp_list; ptr; ptr = ptr->next)
> +       if (lp == ptr)
> +         resume_callback (lp, NULL);
> +    }
> +  return 0;
> +}
> +
>  static ptid_t
>  lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
>  {
> @@ -1206,10 +1307,61 @@
>                 }
>             }
> 
> -         /* Make sure we don't report a TARGET_WAITKIND_EXITED or
> -            TARGET_WAITKIND_SIGNALLED event if there are still LWP's
> -            left in the process.  */
> +         /* Check if the thread has exited.  */
>           if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
> +           {
> +             if (in_thread_list (lp->ptid))
> +               {
> +                 /* Core GDB cannot deal with us deleting the current
> +                    thread.  */
> +                 if (!ptid_equal (lp->ptid, inferior_ptid))
> +                   delete_thread (lp->ptid);
> +                 printf_unfiltered ("[%s exited]\n",
> +                                    target_pid_to_str (lp->ptid));
> +               }
> +
> +             /* If this is the main thread, we must stop all threads and
> +                verify if they are still alive.  This is because in the nptl
> +                thread model, there is no signal issued for exiting LWPs
> +                other than the main thread.  We only get the main thread
> +                exit signal once all child threads have already exited.
> +                If we stop all the threads and use the stop_wait_callback
> +                to check if they have exited we can determine whether this
> +                signal should be ignored or whether it means the end of the
> +                debugged application, regardless of which threading model
> +                is being used.  */
> +             if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
> +               {
> +                 lp->stopped = 1;
> +                 iterate_over_lwps (stop_and_resume_callback, NULL);
> +               }
> +
> +             if (debug_lin_lwp)
> +               fprintf_unfiltered (gdb_stdlog,
> +                                   "LLW: %s exited.\n",
> +                                   target_pid_to_str (lp->ptid));
> +
> +             delete_lwp (lp->ptid);
> +
> +             /* If there is at least one more LWP, then the exit signal
> +                was not the end of the debugged application and should be
> +                ignored.  */
> +             if (num_lwps > 0)
> +               {
> +                 /* Make sure there is at least one thread running.  */
> +                 gdb_assert (iterate_over_lwps (running_callback, NULL));
> +
> +                 /* Discard the event.  */
> +                 status = 0;
> +                 continue;
> +               }
> +           }
> +
> +         /* Check if the current LWP has previously exited.  In the nptl
> +            thread model, LWPs other than the main thread do not issue
> +            signals when they exit so we must check whenever the thread
> +            has stopped.  A similar check is made in stop_wait_callback().  */
> +         if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
>             {
>               if (in_thread_list (lp->ptid))
>                 {
> Index: acconfig.h
> ===================================================================
> RCS file: /cvs/src/src/gdb/acconfig.h,v
> retrieving revision 1.24
> diff -u -r1.24 acconfig.h
> --- acconfig.h  4 Jan 2003 00:34:42 -0000       1.24
> +++ acconfig.h  23 Apr 2003 22:52:44 -0000
> @@ -95,6 +95,9 @@
>  /* Define if using Solaris thread debugging.  */
>  #undef HAVE_THREAD_DB_LIB
> 
> +/* Define if you support the tkill syscall.  */
> +#undef HAVE_TKILL_SYSCALL
> +
>  /* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
>  #undef START_INFERIOR_TRAPS_EXPECTED
>  #undef sys_quotactl
> Index: configure.in
> ===================================================================
> RCS file: /cvs/src/src/gdb/configure.in,v
> retrieving revision 1.126
> diff -u -r1.126 configure.in
> --- configure.in        26 Feb 2003 15:10:47 -0000      1.126
> +++ configure.in        23 Apr 2003 22:52:44 -0000
> @@ -384,6 +384,7 @@
>  AC_CHECK_FUNCS(setpgid setpgrp)
>  AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
>  AC_CHECK_FUNCS(socketpair)
> +AC_CHECK_FUNCS(syscall)
> 
>  dnl AC_FUNC_SETPGRP does not work when cross compiling
>  dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
> @@ -909,6 +910,24 @@
>  if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
>    AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
>              [Define if <thread_db.h> has the TD_NOTALLOC error code.])
> +fi
> +
> +dnl See if we have a sys/syscall header file that has __NR_tkill.
> +if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
> +   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
> +                  gdb_cv_sys_syscall_h_has_tkill,
> +     AC_TRY_COMPILE(
> +       [#include <sys/syscall.h>],
> +       [int i = __NR_tkill;],
> +       gdb_cv_sys_syscall_h_has_tkill=yes,
> +       gdb_cv_sys_syscall_h_has_tkill=no
> +     )
> +   )
> +fi
> +dnl See if we can issue tkill syscall.
> +if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
> +  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
> +            [Define if we can use the tkill syscall.])
>  fi
> 
>  dnl Handle optional features that can be enabled.
> Index: i386-linux-nat.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
> retrieving revision 1.44
> diff -u -r1.44 i386-linux-nat.c
> --- i386-linux-nat.c    16 Apr 2003 15:22:02 -0000      1.44
> +++ i386-linux-nat.c    23 Apr 2003 22:52:44 -0000
> @@ -70,6 +70,9 @@
>  /* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
>  #include "i386-linux-tdep.h"
> 
> +/* Defines ps_err_e, struct ps_prochandle.  */
> +#include "gdb_proc_service.h"
> +
>  /* Prototypes for local functions.  */
>  static void dummy_sse_values (void);
> 
> @@ -682,6 +685,21 @@
>           offsetof (struct user, u_debugreg[regnum]), value);
>    if (errno != 0)
>      perror_with_name ("Couldn't write debug register");
> +}
> +
> +extern ps_err_e
> +ps_get_thread_area(const struct ps_prochandle *ph,
> +                  lwpid_t lwpid, int idx, void **base)
> +{
> +  unsigned long int desc[3];
> +#define PTRACE_GET_THREAD_AREA 25
> +
> +  if  (ptrace (PTRACE_GET_THREAD_AREA,
> +              lwpid, (void *) idx, (unsigned long) &desc) < 0)
> +    return PS_ERR;
> +
> +  *(int *)base = desc[1];
> +  return PS_OK;
>  }
> 
>  void


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]