This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

The cool demangler substitution bug fix


[Usually I do this kind of work to figure out what patch broke something,
 and how.  This time, I'm figuring out what patch fixed something, and how.
 It's a very happy story.]

Yesterday I put up this table:

   gdb  memory  utime  stime  elapsed
   5.3    263M  85.47  26.80   189.68
   6.0    264M  83.76  26.08   187.90
   H19    263M  83.44  25.99   190.00
   H21     46M   7.37   2.07     9.46
   H30     46M   2.72   0.29     3.03

I figured out what happened between H19 and H21.  There was one
bug fix patch from Mark Mitchell:

  http://gcc.gnu.org/ml/gcc-patches/2003-11/msg01583.html
  PATCH: Fix demangler bug

This patch is a one-line fix and is so simple that it is unlikely
to have any bad side effects.

Here are some details.  A C++ mangled name can use substitution tokens
to help shorten the length of the name.  A sample name with a lot
of substitution tokens looks like this:

  _Z3fooiPiPS_PS0_PS1_PS2_PS3_PS4_PS5_PS6_PS7_PS8_PS9_PSA_PSB_PSC_

The substitution tokens are S_, S1_, S2_, ... S9_, SA_, SB_, to SZ_.
After SZ_ comes S10_ and so on.

The rules for indexing the substitution token are complex and I don't
fully understand them.  The old demangler had a bug where it would screw
up its index tables starting with SA_.  So if a mangled name has 12 or
more substitutions then the old demangler would sometimes use incorrect
replacement tokens and produce a bogus demangled name.  Depending on the
details, the bogus demangled name could be much much bigger than the
correct demangled name.

That particular mangled name is in the demangler test suite now, along
with three other names with S[A-Z]_ substitutions.

I am experimenting with c++filt from binutils 2.14 before and after this
fix.  Here are some measurements of "c++filt < all.list", where all.list
is the list of C++ symbols from the "monotone" executable.

    c++filt     output  utime  stime
    2.14     278921468  78.71  26.32
    2.14+     31571627   6.75   2.26
    HEAD      28302900   1.70   0.26

"2.14+" is an exact copy of 2.14 with the one fix applied.
"HEAD" is binutils HEAD 2003-12-04 20:35:37 UTC, which is the new
demangler including today's fixes for "operator< <" and suchlike.

You can see that this fix accounts for the improvement in gdb.  It's
such an enormous difference in speed that it dominates all other issues
about demangler speed, if the program has big hairy template symbols
with enough S[A-Z]_ substitutions in them.

So, anyone who is working on big C++ programs with gdb 6.0 will benefit
from applying Mark's one-liner patch to libiberty/cp-demangle.c and
rebuilding their gdb.  Any vendors who are shipping gdb and don't want
to adopt the new demangler yet may want to apply that patch to their
old demangler right away.  If we produce gdb 6.0.1 (although I hope we
don't) then this patch should get regression-tested and then go in.

And what about the new demangler?  It's much faster than the old demangler,
even with the bug fix on the old demangler.  Of course we want the new
demangler anyways even if it's the same speed but it's nice to see a
big speed improvement on that too.

So let's get everybody patched up with old demangler + patch,
or with new demangler, and then see if there are any remaining
speed issues to talk about.

Michael C


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]