This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Fear and loathing in lookup_block_symbol
- To: gdb-patches at sources dot redhat dot com
- Subject: Fear and loathing in lookup_block_symbol
- From: Jason Molenda <jmolenda at apple dot com>
- Date: Thu, 6 Sep 2001 11:41:01 -0700
This is a little long and involved, but there is a big problem in one of
the lowest level symbol lookup functions, so (someone) please bear with
it all.
Daniel put together some great C++ symbol lookups last summer
http://sources.redhat.com/ml/gdb-patches/2000-08/msg00199.html
Real problems - I'd just started digging into #3 on his set of patches
myself.
Elena looks at the patch a little while longer, raises some issues (like
why the namespace check is no longer done in lookup_block_symbol - good
catch Elena):
http://sources.redhat.com/ml/gdb-patches/2000-09/msg00237.html
There is some back and forth discussion. Elena commits some of the
changes (all of them? I haven't read all the archives that thoroughly,
and it's not relevant)
http://sources.redhat.com/ml/gdb-cvs/2000-10/msg00014.html
Incidentally, Daniel (tisk tisk :) had assured her that the namespace
check wasn't necessary before the commit happened. There is some more
back-and-forth on these patches -- Daniel wasn't especially thorough in
updating all callers of lookup_block_symbol, but Peter Schauer jumps in
and fixes several of these. Daniel says he'll just yank the whole lot
of them and submits a patch to revert it all, but I don't think that
ever happened (again, my goal isn't accurate historical
representation -- there's a bug to be fixed here. :-)
Side bar - we see gdb/15 pop up and Michael Chastain chase it down
http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html
It's because the namespace check is no longer being done. Michael is
mystified
as to how this happened.
OK OK, I'll get to the real bug. The point is that these patches have a
long and seedy history. :-) And at the same time, I want to throw out
props to Daniel -- despite everything, he is getting at real, BIG, gdb
inefficiencies, and he's trying to fix them. A little more follow-up
and completeness would have been good, that's all.
In the patch that Elena committed (URL above), which makes C++ symbol
lookup use the binary search method has a mistake. lookup_block_symbol
has two search methods. One is a binary search; it gets "close" to the
symbol (alphabetically 'below' the correct entry) and then searches over
the next few symbols. The other method is a carefully written linear
search. In the binary search method, this linear-stepping second half
is necessary because we might have multiple symbol names in different
namespaces (e.g. function vs struct namespaces).
Daniel's patch inadvertently changes this second half of the binary
search method -- instead of stepping over a few symbols, gdb now binary
searches to the middle of the block symbols, then linearly steps over
all symbols until it gets to the end of the block. This is very
expensive. Specifically, I'm talking about this patch here:
@@ -1247,19 +1282,8 @@ lookup_block_symbol (register const stru
while (bot < top)
{
sym = BLOCK_SYM (block, bot);
- inc = SYMBOL_NAME (sym)[0] - name[0];
- if (inc == 0)
- {
- inc = STRCMP (SYMBOL_NAME (sym), name);
- }
- if (inc == 0 && SYMBOL_NAMESPACE (sym) == namespace)
- {
- return (sym);
- }
- if (inc > 0)
- {
- break;
- }
+ if (SYMBOL_MATCHES_NAME (sym, name))
+ return sym;
bot++;
}
}
The old code is straightforward - the binary search has put us at the
beginning of matching symbols (the 'bot' index). If the symbol names
match, and the namespaces match, we return the sym. If we've stepped
beyond the block symbols that could match our symbol name, we break out
of the loop. Daniel's change loses this critical "break out of the
loop" part, and iterates over all the block symbols from the
binary-search start point until the end.
We can't copy the old code's style exactly. SYMBOL_MATCHES_NAME calls
strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw
is poorly named; it doesn't give you a less-than greater-than return
value like strcmp(). It'd be better termed streq_iw() or something
unlike strcmp(). But we can make vast improvements without digging into
this problem by changing the code to something like:
while (bot < top)
{
sym = BLOCK_SYM (block, bot);
if (SYMBOL_NAMESPACE (sym) == namespace &&
SYMBOL_MATCHES_NAME (sym, name))
{
return sym;
}
if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
{
break;
}
bot++;
}
The search is not optimal (we'll probably end up stepping over more
symbols than is necessary), but it's a huge improvement over the current
one.
A specific example: One of our demo programs that we ship with the
development tools is a little notepad GUI app. It has these opaque
structure pointers that I mentioned last week, which gdb will never find
a definition of. These opaque structure pointers are often in programs
as local variables, so stepping with a Locals window open highlights any
delay here. With the old iterate-over-half-the-symbols approach, "maint
time 1" reported 1.6 - 1.7 seconds to look grope through all the symbols
looking for this structure. With this break-out clause added, "maint
time 1" reports 0.0 seconds (it's below the measurement threshold).
(and before that second call to check_typedef() was removed, doing "info
locals" would take around 3.5 seconds.)
If someone will agree that this approach is reasonable, I'll check in a
patch. I've already run the above patch through the testsuite, it adds
no new regressions.
Jason
PS- This patch was brought to you by Tom Tromey's gdb-profile patch.
Yes, gdb-profile, it does the things a profiling patch ought to do. I
fired up his patch on gdb, found this naughty little
lookup_block_symbol() function in about one minute, and started digging
in. Very helpful. Don't you wish YOU could use Tom Tromey's
gdb-profile patch?