This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: RFC: partial symbol table address range generalization

To: Jim Blandy <jimb at zwingli dot cygnus dot com>
Subject: Re: RFC: partial symbol table address range generalization
From: Daniel Jacobowitz <drow at mvista dot com>
Date: Wed, 24 Oct 2001 20:39:17 -0400
Cc: gdb-patches at sources dot redhat dot com
References: <20011023233450.09F855E9D8@zwingli.cygnus.com> <20011023195430.A15242@nevyn.them.org> <npsnc965cl.fsf@zwingli.cygnus.com> <20011024004201.A22007@nevyn.them.org> <npbsiw60ne.fsf@zwingli.cygnus.com>
On Wed, Oct 24, 2001 at 07:09:25PM -0500, Jim Blandy wrote:
> 
> Daniel Jacobowitz <drow@mvista.com> writes:
> > > > This particular problem should be avoidable anyway.  I would appreciate
> > > > it if you would look at:
> > > >  <http://sources.redhat.com/ml/gdb/2001-09/msg00068.html>
> > > 
> > > Well, okay.  But I'd much prefer to see the problem fixed by making
> > > the symbol table contents more accurate than by adding another
> > > heuristic for recognizing insane data.  When checks like that get into
> > > the code, they never go away, because nobody's ever really sure
> > > whether the circumstances it was meant to cope with happen any more.
> > 
> > Well, I wouldn't call it a heuristic in this case.  If we don't have
> > debugging info, isn't it a little bit insane to try to find a line
> > number?
> 
> Yes, certainly.  But the reasoning I see going on is this: we can't
> trust our address range data, and our source line search screws up if
> the address range data isn't accurate, so we'll prop up the process by
> pulling a name search into the process, too.  The more checks we add,
> the more likely we are to detect that something's screwey.
> 
> This works, but I'd really rather just make the address range data
> accurate in the first place.  The part that strikes me as a
> `heuristic' is the assumption that the address ranges are not quite
> trustworthy, and that therefore we should cross-check them against the
> function table before we proceed.
> 
> If you think it's going to be too hard to get reliable address ranges
> for symtabs, then it'd certainly be better to put Peter's change in;
> at least something will get fixed.  But if it's practical, I think
> it's better policy to work hard at filling your data structures with
> reliable data than to sprinkle your query code with sanity checks.

Well, for at least STABS we're never going to get reliable address
ranges.  The data we need is not there.  Any inference will be
guesswork at best.

But that isn't really my point.  What the code in question
(find_pc_sect_line) does is:

/* Find the source file and line number for a given PC value and SECTION.
   Return a structure containing a symtab pointer, a line number,
   and a pc range for the entire source line.

If we don't know what function we're in, then it isn't just that we can
not trust the address range data; we know absolutely that we will not
find a valid line number.  I'm pretty sure that find_pc_function can
fail in reasonable ways; in trampolines, for instance.  It may well be
that we won't reach find_pc_sect_line in any of those cases, ideally,
but I'm not convinced of that.

I'd prefer to both work hard to fill the symtabs with reliable data and
also sprinkle sanity checks where they clearly seem to be needed.

> > For performance reasons, it might be better to steal the iterator
> > concept and have an opaque cookie separate from the start/end
> > addresses.  Otherwise, addrset_next_range is not going to be constant
> > time, and that could get annoying in an objfile with a large number of
> > sections... and C++ is notorious for absurd numbers of sections.
> 
> It is (usually) constant time.  :)  The representation maintains a
> `current position' in the sorted linked list of ranges, and starts its
> searches from that point.  So in a series of operations where each
> address is near the last one (like a sequential scan), each operation
> takes place in constant time.
> 
> This is somewhat like having a single iterator object hidden inside
> the addrset object; that is, at any given time, there's only one range
> of addresses which can be operated on quickly.  If instead we had an
> explicit iterator type, one could do multiple traversals of different
> areas efficiently.  However, mutations can delete nodes from the list,
> and change their values; it seems to me that keeping iterators valid
> across mutation operations would add complexity.
> 
> Anyway, it's an opaque type.  If performance is a problem, or if we
> want to add iterators, we can revamp the whole thing without breaking
> its users.
> 
> Here's the implementation:
> 
> 
> /* addrset.c --- implementation of the address set datatype  
>    Copyright 2001 Free Software Foundation, Inc.
> 
>    This file is part of GDB.
> 
>    This program is free software; you can redistribute it and/or modify
>    it under the terms of the GNU General Public License as published by
>    the Free Software Foundation; either version 2 of the License, or
>    (at your option) any later version.
> 
>    This program is distributed in the hope that it will be useful,
>    but WITHOUT ANY WARRANTY; without even the implied warranty of
>    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>    GNU General Public License for more details.
> 
>    You should have received a copy of the GNU General Public License
>    along with this program; if not, write to the Free Software
>    Foundation, Inc., 59 Temple Place - Suite 330,
>    Boston, MA 02111-1307, USA.  */
> 
> #include "defs.h"
> #include "obstack.h"
> #include "gdb_assert.h"
> #include "addrset.h"
> 
> struct addrset {
> 
>   /* Two singly-linked lists of nodes; each node represents a range of
>      addresses in the addrset.  None of these nodes' address ranges
>      overlap or abut each other.  `before' is sorted in order of
>      decreasing addresses, `after' in order of increasing addresses.
>      All nodes in `before' have lower addresses than all the nodes in
>      `after'.
> 
>      Generally, whenever we operate on the list, we try split it
>      between `before' and `after' so that the division comes directly
>      after the last node we operated on.  That way, sequential access
>      happens in constant time.  Since we split it *after* the last
>      node we operate on, we're slightly biased towards traversal
>      towards increasing address, but it's not too bad if we want to go
>      the other way.  */
>   struct node *before, *after;
> 
>   /* If this is non-zero, allocate new nodes from this obstack.
>      Otherwise, allocate nodes using xmalloc.  */
>   struct obstack *obstack;
> 
>   /* We can't free nodes allocated from an obstack, so we might as
>      well keep them in a free list, and reuse them.  This list can't
>      be global, as different obstacks have different lifetimes.
>      Ideally, it would be per-obstack, so a node could be freed from
>      one addrset and then re-used by another addrset in the same
>      obstack.  But we don't have any convenient way to store
>      per-obstack info.  So the free list is per-addrset.
> 
>      For addrsets allocated using xmalloc, we simply xfree nodes, and
>      hope malloc manages small blocks well; this field is always zero.  */
>   struct node *free;
> 
> };
> 
> /* One contiguous range of addresses in an addrset.  */
> struct node {
>   struct node *next;
> 
>   /* The start and end addresses, inclusive, of the address range.
>      Obviously, start <= end.  If start == end, that's a one-byte
>      range.  */
>   CORE_ADDR start, end;       
> };
> 
> 
> struct addrset *
> new_addrset (struct obstack *obstack)
> {
>   struct addrset *new;
> 
>   if (obstack)
>     new = obstack_alloc (obstack, sizeof (*new));
>   else
>     new = xmalloc (sizeof (*new));
> 
>   memset (new, 0, sizeof (*new));
> 
>   new->obstack = obstack;
> 
>   return new;
> }
> 
> 
> static void
> xfree_node_list (struct node *n)
> {
>   while (n)
>     {
>       struct node *next = n->next;
> 
>       xfree (n);
>       n = next;
>     }
> }
> 
> 
> void
> addrset_free (struct addrset *addrset)
> {
>   /* You can't free an addrset allocated in an obstack.  */
>   gdb_assert (! addrset->obstack);
> 
>   /* If it wasn't allocated on an obstack, it had better not have a
>      free list.  */
>   gdb_assert (! addrset->free);
> 
>   xfree_node_list (addrset->before);
>   xfree_node_list (addrset->after);
> }
> 
> 
> static struct node *
> new_node (struct addrset *addrset,
>           CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *new;
> 
>   if (addrset->obstack)
>     {
>       /* If we have any nodes on the free list, use them.  */
>       if (addrset->free)
>         {
>           new = addrset->free;
>           addrset->free = new->next;
>         }
>       else
>         new = obstack_alloc (addrset->obstack, sizeof (*new));
>     }
>   else
>     new = xmalloc (sizeof (*new));
> 
>   new->next = 0;
>   new->start = start;
>   new->end = end;
> 
>   return new;
> }
> 
> 
> static void
> free_node (struct addrset *addrset, struct node *node)
> {
>   if (addrset->obstack)
>     {
>       node->next = addrset->free;
>       addrset->free = node;
>     }
>   else
>     xfree (node);
> }
> 
> 
> /* Return true if A is less than B, and there is at least one
>    address between them.  */
> static int
> less_and_separate (CORE_ADDR a, CORE_ADDR b)
> {
>   /* We need both tests; think about what happens, for example, when
>      B is zero and A is (CORE_ADDR) -1.  */ 
>   return (a < b && b - a >= 2);
> }
> 
> 
> /* Shift nodes from `before' to `after', or in the other direction,
>    until the split falls at the right place.  By `the right place', we
>    mean:
>    - any nodes on `after' are after, and do not abut, the range from
>      START to END, and
>    - any nodes on `before' are either completely before, abut, or overlap
>      the range from START to END.
>    Once this is done, any nodes on `after' are irrelevant to our
>    operation, and any nodes that abut or overlap our range are at the
>    head of `before'.  */
> static void
> resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node;
> 
>   /* We know the `before' and `after' lists are each sorted in the
>      proper direction, and contain non-overlapping, non-adjacent
>      ranges.
> 
>      The first step is to move all ranges from `after' to `before'
>      that are before, overlap, or abut the range we're operating on.
>      Be careful of nodes abutting either end of the address space.  */
>   while ((node = addrset->after) != 0
>          && ! less_and_separate (end, node->start))
>     {
>       addrset->after = node->next;
>       node->next = addrset->before;
>       addrset->before = node;
>     }
> 
>   /* Now, any ranges in `after' are after and do not abut the range
>      we're operating on.  This means that all those nodes are
>      irrelevant to this operation, and we can ignore them.
> 
>      Thus, any nodes we care about are somewhere in `before'.  Bring
>      them to the head of that list by shifting any nodes from `before'
>      to `after' that are completely after the range we're operating
>      on.
> 
>      The astute reader will note that the condition for moving a node
>      in this loop is the complement of that for the previous loop.
>      That is, the condition for moving a node in one direction is the
>      opposite of moving it in the other direction.  This means that
>      only one of these two loops' bodies will ever be executed in a
>      given call.  */
>   while ((node = addrset->before) != 0
>          && less_and_separate (end, node->start))
>     {
>       addrset->before = node->next;
>       node->next = addrset->after;
>       addrset->after = node;
>     }
> }
> 
> 
> void
> addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node, *save;
> 
>   gdb_assert (start <= end);
> 
>   resplit (addrset, start, end);
> 
>   /* Now we know that any ranges in `after' are irrelevant to this
>      operation, and that zero or more nodes at the head of `before'
>      abut or overlap the range we're adding.  Remove any such nodes
>      from `before', and absorb their ranges into ours.  */
>   save = 0;
>   while ((node = addrset->before) != 0
>          && ! less_and_separate (node->end, start))
>     {
>       if (node->start < start)
>         start = node->start;
>       if (node->end > end)
>         end = node->end;
>       addrset->before = node->next;
> 
>       /* We want to remove this node from the list, but we know we're
>          going to insert a new one in a bit, so hold onto the first
>          node we delete, just to save a call to xfree and xmalloc.  */
>       if (save)
>         free_node (addrset, node);
>       else
>         save = node;
>     }
> 
>   /* Create a node for the new range.  */
>   if (save)
>     {
>       save->start = start;
>       save->end = end;
>     }
>   else
>     save = new_node (addrset, start, end);
> 
>   /* Stick it on the head of the `before' list.  */
>   save->next = addrset->before;
>   addrset->before = save;
> }
> 
> 
> void
> addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node;
> 
>   gdb_assert (start <= end);
> 
>   /* This could end up moving a bunch of nodes to `before' that we're
>      just going to delete anyway.  But our running time is linear in
>      the number of ranges we're going to delete anyway, and this lets
>      us deal with the `before' list only; otherwise, we'd have to
>      write out the code twice, once for each list.  */
>   resplit (addrset, start, end);
> 
>   /* Now we know that any ranges in `after' are irrelevant to this
>      operation, and that zero or more nodes at the head of `before'
>      abut or overlap the range we're adding.
> 
>      If the node at the head of the `before' list extends after our
>      range, either truncate it and shift it to `after', or split it in
>      two.  */
>   if ((node = addrset->before) != 0
>       && end < node->end)
>     {
>       /* If it also extends before the range we're deleting, we need
>          to split it into two nodes: one for the part before and one
>          for the part after.  And we know we don't need to do anything
>          else to the `before' list.  */
>       if (node->start < start)
>         {
>           struct node *right_part = new_node (addrset, end + 1, node->end);
>           right_part->next = addrset->after;
>           addrset->after = right_part;
>           node->end = start - 1;
>           return;
>         }
> 
>       /* Just shift the node onto `after', and adjust its size.  */
>       addrset->before = node->next;
>       node->next = addrset->after;
>       addrset->after = node;
>       node->start = end + 1;
> 
>       /* There may be more nodes on `before' that we need to adjust.  */
>     }
> 
>   /* Now we know that the head of `before' doesn't extend beyond the
>      end of the range we're deleting.  Nothing will need to be added
>      to `after'.
> 
>      Remove any nodes completely contained in the region we're
>      deleting.  */
>   while ((node = addrset->before) != 0
>          && start <= node->start)
>     {
>       addrset->before = node->next;
>       free_node (addrset, node);
>     }
> 
>   /* The head of `before' (if any) isn't completely contained in the
>      range we're deleting, but it may partially overlap it at the low
>      end.  If that's the case, adjust its endpoint.  */
>   if ((node = addrset->before) != 0
>       && start <= node->end)
>     node->end = start - 1;
> }
> 
> 
> /* Shift *ADDR by OFFSET, and check that the direction of change from
>    OLD to NEW is consistent with *DIR.  If it isn't, raise an internal
>    error.  OFFSET must be non-zero.  See offset_node_list for the
>    meaning of *DIR.  */
> static void
> shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir)
> {
>   CORE_ADDR old = *addr;
>   CORE_ADDR new = (*addr += offset);
> 
>   if (new < old)
>     {
>       if (! *dir)
>         *dir = -1;
>       else
>         gdb_assert (*dir == -1);
>     }
>   else
>     {
>       if (! *dir)
>         *dir = 1;
>       else
>         gdb_assert (*dir == 1);
>     }
> }
> 
> 
> /* Offset addresses in NODE and its successors by OFFSET.
>    OFFSET must be non-zero.
>    *DIR is the direction of the shift, used for error checking.
>    - If *DIR is zero, then we don't know which direction things will be
>      shifted yet.
>    - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes
>      downwards or upwards, and we should raise an internal_error if any
>      of our addresses go in the other direction.  */
> static void
> offset_node_list (struct node *node, CORE_ADDR offset, int *dir)
> {
>   while (node)
>     {
>       shift_addr (&node->start, offset, dir);
>       shift_addr (&node->end, offset, dir);
> 
>       node = node->next;
>     }
> }
> 
> 
> void
> addrset_offset (struct addrset *addrset, CORE_ADDR offset)
> {
>   int dir = 0;
> 
>   if (offset == 0)
>     return;
> 
>   offset_node_list (addrset->before, offset, &dir);
>   offset_node_list (addrset->after, offset, &dir);
> }
> 
> 
> int
> addrset_in (struct addrset *addrset, CORE_ADDR addr)
> {
>   struct node *node;
> 
>   resplit (addrset, addr, addr);
> 
>   /* Now any node containing addr would have to be on `before'.  */
>   return ((node = addrset->before) != 0
>           && node->start <= addr && addr <= node->end);
> }
> 
> 
> int
> addrset_span (struct addrset *addrset,
>               CORE_ADDR addr,
>               CORE_ADDR *start,
>               CORE_ADDR *end)
> {
>   struct node *node;
> 
>   resplit (addrset, addr, addr);
> 
>   /* Now any node containing addr would have to be on `before'.  */
>   if ((node = addrset->before) != 0
>       && node->start <= addr && addr <= node->end)
>     {
>       *start = node->start;
>       *end = node->end;
>       return 1;
>     }
>   else
>     return 0;
> }
> 
> 
> void
> addrset_first_range (struct addrset *addrset,
>                      CORE_ADDR *start,
>                      CORE_ADDR *end)
> {
>   struct node *node;
>   
>   resplit (addrset, 0, 0);
>   
>   /* If there's anything on `before', that's our first range.
>      Otherwise, if there's anything on after, that's it.  */
>   if ((node = addrset->before) != 0
>       || (node = addrset->after) != 0)
>     {
>       *start = node->start;
>       *end = node->end;
>     }
>   else
>     {
>       /* ADDRSET is empty.  */
>       *start = 1;
>       *end = 0;
>     }
> }
> 
> 
> void
> addrset_next_range (struct addrset *addrset,
>                     CORE_ADDR after,
>                     CORE_ADDR *start,
>                     CORE_ADDR *end)
> {
>   struct node *node;
> 
>   resplit (addrset, after, after);
> 
>   /* The first node on `before' might extend beyond our range.  */
>   if ((node = addrset->before) != 0
>       && after < node->end)
>     {
>       *start = after + 1;
>       *end = node->end;
>     }
> 
>   /* Otherwise, the first node on `after' is the one we want.  */
>   else if ((node = addrset->after) != 0)
>     {
>       *start = node->start;
>       *end = node->end;
>     }
> 
>   /* Otherwise, there are no more addresses in ADDRSET after AFTER.  */
>   else
>     {
>       *start = 1;
>       *end = 0;
>     }
> }
> 
> 
> void
> addrset_print (struct addrset *addrset, struct ui_file *outfile)
> {
>   CORE_ADDR start, end;
>   int first = 1;
> 
>   for (addrset_first_range (addrset, &start, &end);
>        start <= end;
>        addrset_next_range (addrset, end, &start, &end))
>     {
>       if (! first)
>         fprintf_filtered (outfile, ", ");
>       print_address_numeric (start, 1, outfile);
>       fprintf_filtered (outfile, "-");
>       print_address_numeric (end, 1, outfile);
>       first = 0;
>     }
> }
> 

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer
References:
- RFC: partial symbol table address range generalization
  - From: Jim Blandy
- Re: RFC: partial symbol table address range generalization
  - From: Daniel Jacobowitz
- Re: RFC: partial symbol table address range generalization
  - From: Jim Blandy
- Re: RFC: partial symbol table address range generalization
  - From: Daniel Jacobowitz
- Re: RFC: partial symbol table address range generalization
  - From: Jim Blandy
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]