This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Removal of demangled names from partial symbols


Demangled names were removed from partial symbols to speed start up
times a few years ago.

However, with the minsym demangled hash table now around, we demangle
all minimal symbols when we install minimal symbols (IE we init the
demangled name on them,unconditionally).

Since the minimal symbol table ends up including a large subset of the
mangled partial symbols (if not all of them), this means we already have a large
subset of the partial symbol names demangled for us at start up
anyway.

Why do we not do a lookup_minimal_symbol in
a new function, add_psymbol_and_dem_name_to_list, on the mangled name,
and if we get  back a symbol, use the demangled name from that,
otherwise, demangle it.

Even tests on 100 meg of debug info show we barely add any startup
time at all (5 seconds without, 6 seconds with) . 
In fact, all added startup time is attributable to the
fact that to save memory, I had it bcache the demangled name in
SYMBOL_INIT_DEMANGLED_NAME.  If you don't bcache it (like right now),
it's in memory  in at least the full symbol, and the minimal
symbol (it's  actually in memory once for every time
SYMBOL_INIT_DEMANGLED_NAME is called on a symbol, and the demangling succeeds).

I think 1 second on 100 meg of debug info is worth it to not have to
linear search on every symbol lookup, which is amazingly 
slow, and if you have gdb using swap at all because of the number of
symbols, you are almost guaranteed to hit the swap 
hard on *every* single lookup, since we have to go through every
single symbol. 

This would solve the problem of not being able to lookup partial
symbols by demangled name, and allow us to binary search them without
fear of missing a symbol.

Would this be acceptable?

My next trick after that would be to add a mangled->demangled mapping
structure, if it's necessary to improve speed, and just use that to
lookup the names before demangling the 
name over again, in cases where we do (ie SYMBOL_INIT_DEMANGLED) need
to find a demangled name for a mangled one, and use that
rather than the minimal symbol table to try to find the name.
The reason for this is that a hash table (in this case, we are
using the minimal symbol demangled hash table as a lookup table) is the wrong structure
for this, since demangled names can be *very* large (average of 82
chars on my large C++ programs), and we always have to hash the entire
string, then do a whole bunch of string compares, because the chains are
long. This is okay when we hit (except for the long chains), but on
misses we waste the same amount of times as hits, if not more. The
string compares on hits also cost a lot because of the length of the string.
We really should use a ternary search tree or some structure like it,
which on hits is actually faster (since we don't need multiple
string compares), and on misses is a whole ton faster, since we abort
much sooner.

--Dan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]