This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Demangling and searches

From: Paul Hilfinger <hilfingr at CS dot Berkeley dot EDU>
To: David Carlton <carlton at math dot stanford dot edu>
Cc: Elena Zannoni <ezannoni at redhat dot com>, Adam Fedor <fedor at doc dot com>, GDB Patches <gdb at sources dot redhat dot com>, Daniel Jacobowitz <drow at mvista dot com>
Date: Wed, 08 Jan 2003 18:37:57 -0800
Subject: Re: Demangling and searches

 > I'm curious: in Ada, what does the mangling do?  In particular, how
 > much type info does it contain?  In C++, the mangled name contains
 > type info for the arguments for functions; I don't see how, using
 > GDB's current data structures, to allow us to allow users to, say,
 > break on a function without requiring them to specify the types of the
 > arguments, if we took your approach.  (Though it might be possible to
 > modify GDB's data structures to allow that.)

In Ada, the mangled name does not contain type information, but we
actually solve an even harder problem.  The mangled name contains
certain information that the user doesn't necessarily know, so that the system
CANNOT reconstruct the full mangled name from the user's input a
priori.  Yet nevertheless, we look up quite successfully, thank you
very much.  The technique is simple: the mangled names have the form

     <NAME><OTHER INFO>

where the syntax of <OTHER INFO> is such that we can find its
beginning.  In Ada mode, GDB has to use a special lookup function
that ignores <OTHER INFO>, but otherwise the algorithm is the same as
for C or C++, etc.

Let's look at this a bit abstractly.  We're given a key K to
search for, among a set of symbols, S.  There is some predicate, P,
that defines success: you want s in S such that P(K, s).   The current
scheme achieves speed with two complementary strategies:

       1. Use K to find a subset, S', of S, limiting the number of 
          comparisons.
       2. Speed up P with preprocessing: P has (roughly) the form
                 K equals SYMBOL_SOURCE_NAME (s)
	  where SYMBOL_SOURCE_NAME (s) = f (SYMBOL_NAME (s), SYMBOL_LANGUAGE(s)),

I am asking why we can't change P to 

       K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)),

or, more abstractly, to something like

       compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0

saving considerable space in the process.  (Daniel Berlin points out
that ABI also figures into these, but here I'll just go by the
parameterization in symtab.h).  The answer is "because f is costly
when applied to lots of symbols."  But this answer really makes sense
only if strategy 1 above is ineffective.

If your symbol-search structure is a hash table, then all you have to
do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is irrelevant
whether you actually store the SYMBOL_SOURCE_NAME in s.  Hash tables
have precisely the effect of drastically decreasing the size of S'.  
If you are relying on binary search, admittedly, things are not so
simple, since the sort itself requires several evaluations of
SYMBOL_SOURCE_NAME for each symbol (but then, I've always preferred
hash tables (:->)).


Paul Hilfinger

Follow-Ups:
- Re: Demangling and searches
  - From: David Carlton

References:
- Re: Demangling and searches
  - From: David Carlton

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]