This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[rfc, v2] Fix frame_id_inner comparison false positives


Daniel Jacobowitz wrote:

> I had a concern about this patch, which I couldn't explain well in
> person - let me see if I can do better with examples.

Thanks for looking at this, and sorry for the long delay in my
getting back to this patch ...

> > - dummy_frame_push checks for stale dummy frame IDs.  As suggested
> >   in a comment by Andrew, this can simply do a frame_find_by_id check.
>
> frame_find_by_id is a loop calling get_prev_frame until we run out of
> stack.  By default there is no limit on the number of frames GDB will
> unwind.  So if the stack is very deep, or if the unwinder is confused
> and the resulting stack is essentially infinite (a common failure
> mode), frame_find_by_id may take a very long time.  frame_find_by_id
> is constant time (though, as you've noted, sometimes wrong).

I think the frame_id_inner check in dummy_frame_push is simply wrong
anyway.  On the one hand, there are false positives on targets where
the range of valid stack addresses is non-contiguous (e.g. with 
sigaltstack, or in a multi-arch setup).  On the other hand, even so
there are common situations where there are false negatives: if you
simply call multiple inferior functions one after the other from the
same stack frame, their dummy IDs will all have the same stack address,
and therefore the frame_id_inner check will not recognize the old 
dummy frames may be cleaned up.

Overall, I guess I still agree with Andrew's comment that at this point,
we just should do a check whether the frame ID is still valid.

On the other hand, you're probably right that we need to take care to
avoid unnecessary looping/backtracing -- but that should be fixed *in*
frame_find_by_id.

> On the other hand, if the user has set a backtrace depth limit, then
> frame_find_by_id may fail if a function with a deep stack is called.
> We'll discard the dummy frame when it's still valid and get an extra
> sigtrap.  This behavior should be straightforward to write a test case
> for: set the backtrace limit to ten, call a function which recurses
> twenty times and then hits a breakpoint, call another function, raise
> the backtrace limit and see if we've lost the outer dummy frame.

This seems simply a bug of frame_find_by_id.  This function should be
using get_prev_frame_1 instead of get_prev_frame.  There is no reason
why re-identifying a frame you had already selected previously should
suddenly fail just because some user-interface setting was changed ...

> This is a limit on the number of cycles frame_find_by_id will travel
> if we're in a new location or if we have a confused unwinder.
> Suppose we record a varobj at frame sp==0x1000 (stack grows down).
> We continue.  Next time we hit a breakpoint sp==0xfc0.  The frontend
> requests a varobj update.  We unwind, find the next frame is
> sp==0x1200.  The old frame is gone; we stop.  Good thing, since if
> we'd kept backtracing we'd find the stack went all the way up to
> 0xff00... it'd take seconds or minutes to get up there.

Agreed, we should keep some sanity check here.  See below ...

> I think the part using frame_id_inner is another corrupt stack check,
> and the chances of it forming an exact cycle are slim to none.  But
> it's clear that there's a problem with the multi-stack or multi-arch
> cases that you're fixing.

This part is actually OK, as long as cross-arch frames are treated just
like signal trampoline frames are today.

> Is there some other way we can avoid unnecessary backtracing?  Maybe a
> predicate which determines whether two consecutive frames are
> reasonably on the same stack - in the presence of sigaltstack that
> could be quite fragile though.

Even in the presence of non-contiguous stack ranges, we can still
keep a significant set of safety-net changes because for NORMAL
frames changes in the stack address still follow predictable rules:

   * If frame NEXT is the immediate inner frame to THIS, and NEXT
     is a NORMAL frame, then the stack address of NEXT must be
     inner-than-or-equal to the stack address of THIS.

     Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
     error has occurred.

   * If frame NEXT is the immediate inner frame to THIS, and NEXT
     is a NORMAL frame, and NEXT and THIS have different stack
     addresses, no other frame in the frame chain may have a stack
     address in between.

     Therefore, if frame_id_inner (TEST, THIS) holds, but
     frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
     to a valid frame in the frame chain. 

This will catch for example the case you mention below, while at
the same time never yielding false positives in non-contiguous
stack situations.

The patch below implements this logic.  Note that frame_id_inner
is now only used in frame.c, so I've made it static -- due to the
somewhat fragile logic, it's probably better if is isn't used
outside the core frame code anyway ...

What do you think?

Tested on powerpc-linux, powerpc64-linux, and spu-elf with no
regressions.

Bye,
Ulrich

ChangeLog:

        * frame.h (struct frame_id): Update comments.
        (frame_id_inner): Remove prototype.
        * frame.c (frame_id_inner): Make static.  Add comments.
        (frame_find_by_id): Update frame_id_inner safety net check to avoid
	false positives for targets using non-contiguous stack ranges.
	Use get_prev_frame_1 instead of get_prev_frame.
        (get_prev_frame_1): Update frame_id_inner safety net check.

	* dummy-frame.c (dummy_frame_push): Use frame_find_by_id to
	detect stale dummy frames.
	* stack.c (return_command): Directly pop the selected frame.
	* infrun.c (handle_inferior_event): Remove dead code.
	* i386-tdep.c (i386_push_dummy_call): Update comment.

diff -urNp gdb-orig/gdb/dummy-frame.c gdb-head/gdb/dummy-frame.c
--- gdb-orig/gdb/dummy-frame.c	2008-05-01 01:19:59.000000000 +0200
+++ gdb-head/gdb/dummy-frame.c	2008-08-21 19:34:49.716119838 +0200
@@ -87,7 +87,6 @@ void
 dummy_frame_push (struct regcache *caller_regcache,
 		  const struct frame_id *dummy_id)
 {
-  struct gdbarch *gdbarch = get_regcache_arch (caller_regcache);
   struct dummy_frame *dummy_frame;
 
   /* Check to see if there are stale dummy frames, perhaps left over
@@ -95,8 +94,7 @@ dummy_frame_push (struct regcache *calle
      the debugger.  */
   dummy_frame = dummy_frame_stack;
   while (dummy_frame)
-    /* FIXME: cagney/2004-08-02: Should just test IDs.  */
-    if (frame_id_inner (gdbarch, dummy_frame->id, (*dummy_id)))
+    if (frame_find_by_id (dummy_frame->id) == NULL)
       /* Stale -- destroy!  */
       {
 	dummy_frame_stack = dummy_frame->next;
diff -urNp gdb-orig/gdb/frame.c gdb-head/gdb/frame.c
--- gdb-orig/gdb/frame.c	2008-08-05 22:16:25.000000000 +0200
+++ gdb-head/gdb/frame.c	2008-08-21 20:39:52.682528854 +0200
@@ -368,7 +368,33 @@ frame_id_eq (struct frame_id l, struct f
   return eq;
 }
 
-int
+/* Safety net to check whether frame ID L should be inner to
+   frame ID R, according to their stack addresses.
+
+   This method cannot be used to compare arbitrary frames, as the
+   ranges of valid stack addresses may be discontiguous (e.g. due
+   to sigaltstack).
+
+   However, it can be used as safety net to discover invalid frame
+   IDs in certain circumstances.
+
+   * If frame NEXT is the immediate inner frame to THIS, and NEXT
+     is a NORMAL frame, then the stack address of NEXT must be
+     inner-than-or-equal to the stack address of THIS.
+
+     Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
+     error has occurred.
+
+   * If frame NEXT is the immediate inner frame to THIS, and NEXT
+     is a NORMAL frame, and NEXT and THIS have different stack
+     addresses, no other frame in the frame chain may have a stack
+     address in between.
+
+     Therefore, if frame_id_inner (TEST, THIS) holds, but
+     frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
+     to a valid frame in the frame chain.   */
+
+static int
 frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)
 {
   int inner;
@@ -395,28 +421,34 @@ frame_id_inner (struct gdbarch *gdbarch,
 struct frame_info *
 frame_find_by_id (struct frame_id id)
 {
-  struct frame_info *frame;
+  struct frame_info *frame, *prev_frame;
 
   /* ZERO denotes the null frame, let the caller decide what to do
      about it.  Should it instead return get_current_frame()?  */
   if (!frame_id_p (id))
     return NULL;
 
-  for (frame = get_current_frame ();
-       frame != NULL;
-       frame = get_prev_frame (frame))
+  for (frame = get_current_frame (); ; frame = prev_frame)
     {
       struct frame_id this = get_frame_id (frame);
       if (frame_id_eq (id, this))
 	/* An exact match.  */
 	return frame;
-      if (frame_id_inner (get_frame_arch (frame), id, this))
-	/* Gone to far.  */
+
+      prev_frame = get_prev_frame_1 (frame);
+      if (!prev_frame)
+	return NULL;
+
+      /* As a safety net to avoid unnecessary backtracing while trying
+	 to find an invalid ID, we check for a common situation where
+	 we can detect from comparing stack addresses that no other
+	 frame in the current frame chain can have this ID.  See the
+	 comment at frame_id_inner for details.   */
+      if (get_frame_type (frame) == NORMAL_FRAME
+	  && !frame_id_inner (get_frame_arch (frame), id, this)
+	  && frame_id_inner (get_frame_arch (prev_frame), id,
+			     get_frame_id (prev_frame)))
 	return NULL;
-      /* Either we're not yet gone far enough out along the frame
-         chain (inner(this,id)), or we're comparing frameless functions
-         (same .base, different .func, no test available).  Struggle
-         on until we've definitly gone to far.  */
     }
   return NULL;
 }
@@ -1222,11 +1254,10 @@ get_prev_frame_1 (struct frame_info *thi
 
   /* Check that this frame's ID isn't inner to (younger, below, next)
      the next frame.  This happens when a frame unwind goes backwards.
-     Exclude signal trampolines (due to sigaltstack the frame ID can
-     go backwards) and sentinel frames (the test is meaningless).  */
-  if (this_frame->next->level >= 0
-      && this_frame->next->unwind->type != SIGTRAMP_FRAME
-      && frame_id_inner (get_frame_arch (this_frame), this_id,
+     This check is valid only if the next frame is NORMAL.  See the
+     comment at frame_id_inner for details.  */
+  if (this_frame->next->unwind->type == NORMAL_FRAME
+      && frame_id_inner (get_frame_arch (this_frame->next), this_id,
 			 get_frame_id (this_frame->next)))
     {
       if (frame_debug)
diff -urNp gdb-orig/gdb/frame.h gdb-head/gdb/frame.h
--- gdb-orig/gdb/frame.h	2008-08-05 22:16:25.000000000 +0200
+++ gdb-head/gdb/frame.h	2008-08-21 19:51:46.145791293 +0200
@@ -111,7 +111,7 @@ struct frame_id
      frames that do not change the stack but are still distinct and have 
      some form of distinct identifier (e.g. the ia64 which uses a 2nd 
      stack for registers).  This field is treated as unordered - i.e. will
-     not be used in frame ordering comparisons such as frame_id_inner().
+     not be used in frame ordering comparisons.
 
      This field is valid only if special_addr_p is true.  Otherwise, this
      frame is considered to have a wildcard special address, i.e. one that
@@ -124,22 +124,7 @@ struct frame_id
   unsigned int special_addr_p : 1;
 };
 
-/* Methods for constructing and comparing Frame IDs.
-
-   NOTE: Given stackless functions A and B, where A calls B (and hence
-   B is inner-to A).  The relationships: !eq(A,B); !eq(B,A);
-   !inner(A,B); !inner(B,A); all hold.
-
-   This is because, while B is inner-to A, B is not strictly inner-to A.  
-   Being stackless, they have an identical .stack_addr value, and differ 
-   only by their unordered .code_addr and/or .special_addr values.
-
-   Because frame_id_inner is only used as a safety net (e.g.,
-   detect a corrupt stack) the lack of strictness is not a problem.
-   Code needing to determine an exact relationship between two frames
-   must instead use frame_id_eq and frame_id_unwind.  For instance,
-   in the above, to determine that A stepped-into B, the equation
-   "A.id != B.id && A.id == id_unwind (B)" can be used.  */
+/* Methods for constructing and comparing Frame IDs.  */
 
 /* For convenience.  All fields are zero.  */
 extern const struct frame_id null_frame_id;
@@ -176,12 +161,6 @@ extern int frame_id_p (struct frame_id l
    either L or R have a zero .func, then the same frame base.  */
 extern int frame_id_eq (struct frame_id l, struct frame_id r);
 
-/* Returns non-zero when L is strictly inner-than R (they have
-   different frame .bases).  Neither L, nor R can be `null'.  See note
-   above about frameless functions.  */
-extern int frame_id_inner (struct gdbarch *gdbarch, struct frame_id l,
-			   struct frame_id r);
-
 /* Write the internal representation of a frame ID on the specified
    stream.  */
 extern void fprint_frame_id (struct ui_file *file, struct frame_id id);
diff -urNp gdb-orig/gdb/i386-tdep.c gdb-head/gdb/i386-tdep.c
--- gdb-orig/gdb/i386-tdep.c	2008-08-12 19:13:15.000000000 +0200
+++ gdb-head/gdb/i386-tdep.c	2008-08-21 19:34:49.736116974 +0200
@@ -1712,8 +1712,8 @@ i386_push_dummy_call (struct gdbarch *gd
      (i386_frame_this_id, i386_sigtramp_frame_this_id,
      i386_dummy_id).  It's there, since all frame unwinders for
      a given target have to agree (within a certain margin) on the
-     definition of the stack address of a frame.  Otherwise
-     frame_id_inner() won't work correctly.  Since DWARF2/GCC uses the
+     definition of the stack address of a frame.  Otherwise frame id
+     comparison might not work correctly.  Since DWARF2/GCC uses the
      stack address *before* the function call as a frame's CFA.  On
      the i386, when %ebp is used as a frame pointer, the offset
      between the contents %ebp and the CFA as defined by GCC.  */
diff -urNp gdb-orig/gdb/infrun.c gdb-head/gdb/infrun.c
--- gdb-orig/gdb/infrun.c	2008-08-08 20:14:08.000000000 +0200
+++ gdb-head/gdb/infrun.c	2008-08-21 19:34:49.796108383 +0200
@@ -3284,33 +3284,6 @@ infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME (
   tss->current_line = stop_pc_sal.line;
   tss->current_symtab = stop_pc_sal.symtab;
 
-  /* In the case where we just stepped out of a function into the
-     middle of a line of the caller, continue stepping, but
-     step_frame_id must be modified to current frame */
-#if 0
-  /* NOTE: cagney/2003-10-16: I think this frame ID inner test is too
-     generous.  It will trigger on things like a step into a frameless
-     stackless leaf function.  I think the logic should instead look
-     at the unwound frame ID has that should give a more robust
-     indication of what happened.  */
-  if (step - ID == current - ID)
-    still stepping in same function;
-  else if (step - ID == unwind (current - ID))
-    stepped into a function;
-  else
-    stepped out of a function;
-  /* Of course this assumes that the frame ID unwind code is robust
-     and we're willing to introduce frame unwind logic into this
-     function.  Fortunately, those days are nearly upon us.  */
-#endif
-  {
-    struct frame_info *frame = get_current_frame ();
-    struct frame_id current_frame = get_frame_id (frame);
-    if (!(frame_id_inner (get_frame_arch (frame), current_frame,
-			  step_frame_id)))
-      step_frame_id = current_frame;
-  }
-
   if (debug_infrun)
      fprintf_unfiltered (gdb_stdlog, "infrun: keep going\n");
   keep_going (ecs);
diff -urNp gdb-orig/gdb/stack.c gdb-head/gdb/stack.c
--- gdb-orig/gdb/stack.c	2008-08-05 22:16:26.000000000 +0200
+++ gdb-head/gdb/stack.c	2008-08-21 19:34:49.850100651 +0200
@@ -1855,29 +1855,8 @@ If you continue, the return value that y
 	error (_("Not confirmed"));
     }
 
-  /* NOTE: cagney/2003-01-18: Is this silly?  Rather than pop each
-     frame in turn, should this code just go straight to the relevant
-     frame and pop that?  */
-
-  /* First discard all frames inner-to the selected frame (making the
-     selected frame current).  */
-  {
-    struct frame_id selected_id = get_frame_id (get_selected_frame (NULL));
-    while (!frame_id_eq (selected_id, get_frame_id (get_current_frame ())))
-      {
-	struct frame_info *frame = get_current_frame ();
-	if (frame_id_inner (get_frame_arch (frame), selected_id,
-			    get_frame_id (frame)))
-	  /* Caught in the safety net, oops!  We've gone way past the
-             selected frame.  */
-	  error (_("Problem while popping stack frames (corrupt stack?)"));
-	frame_pop (get_current_frame ());
-      }
-  }
-
-  /* Second discard the selected frame (which is now also the current
-     frame).  */
-  frame_pop (get_current_frame ());
+  /* Discard the selected frame and all frames inner-to it.  */
+  frame_pop (get_selected_frame (NULL));
 
   /* Store RETURN_VALUE in the just-returned register set.  */
   if (return_value != NULL)

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]