This is the mail archive of the binutils@sourceware.cygnus.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

stabs and N_EXCL


Hello,

It appears that excluding include file tags -- a.k.a. N_EXCL treatment --
by GNU ld can (and does) give rise to badly formed stabs sections in
resulting executables, with inevitable loss of some debug information. 
(my point of entry to this is that I'm writing my own stabs parsing
routines).

I'll explain briefly my understanding of the problem (please correct me
wherever I'm wrong).

The relevant code is in bfd/stabs.c, function _bdf_link_section_stabs() .
The linker scans each include file stabs, found between N_BINCL and
N_EINCL stabs (and ignoring nested includes), and computes a hash value
by simply adding all numeric values of the characters in the stab strings
together, while skipping the file numbers inside (filenumber,typenumber)
pairs. This skipping is intended to allow the linker to realize two
instances of including a header file as identical, even though they
may have different filenumbers when included from different source files.

When a hash value matches a value from a previous include of the same
header file, all stabs in this include are deleted, and an N_EXCL stab
is inserted. Both N_BINCL and N_EXCL stabs will carry the hash value
in the n_value field of the stab, to allow debuggers to find the correct
symbols. 

The docs claim that this is exactly what Sun's linker does to computer
the hash, so binutils simply reimplements the same algorithm. However,
I don't know whether Sun's compiler tools emit stabs, in an include file,
that always define types with the current include file filenumber -- if
they do, the problem should not arise. The problem I'm about to describe
arises because of the following basic scenario:

1. Source file foo.c includes header file bar1.h and then bar2.h
2. bar1.h forward-declares ``struct foo;'' ; at this point ``foo'' is 
assigned a typenumber, say (1,5) where 1 is the filenumber of bar1.h .
3. bar2.h actually defines struct foo. Even though "native" type definitions
of bar2.h have the form (2,x)=... because 2 is its filenumber, the
definition for struct foo looks like (1,5)=... because it had been assigned
a typenumber beforehand.

When the linker encounters bar2.h's stabs in foo.o, it'll compute the hash,
striking out filenumbers without distinguishing bar2.h's "own" definitions
from "foreign" definitions. However, the typenumbers of bar2.h's "foreign"
definitions fluctuate wildly based on which header files were included
before it in a particular source file. Even the typenumbers of bar2.h's
"own" definitions fluctuate because, for instance, if in another source file
only bar2.h is included, struct foo will suddenly becomes its "own"
definition with a native filenumber, and will change the sequence of "native"
typenumbers.

Finally, the gist of the matter: it does in fact happen that two different
sets of stabs for the same header file, which have different typenumbers,
just happen to have the same hash value (because the fluctuations in typenumbers
are small and cancel each other when added ASCIIly). The second instance of 
the header will be deleted from the executable, but the first instance's 
typenumber will not match the typenumbers demanded by definitions in the 
rest of the second source file. I believe there is irreparable loss of 
information at this point, i.e. a debugger simply cannot be smart enough 
to link types correctly because the typenumbers don't match.

I have the actual examples of source/include files, and stabs of the
include files which are different in their typenumbers, and yet have
the same hash value, and will send them upon request. The scenario seems
to appear simply due to the law of big numbers - i.e. in a large project,
with many different possible include paths for the same header file, it'll
just happen.

I think it would be great if the hash computing algorithm could be made
a bit smarter to make such coincidences completely improbable, and have
some ideas to offer on how this could be done. Since the hash computing
algorithm is completely internal to the linker, and is never used by
debuggers, it doesn't seem overly important to avoid deviation from Sun's
original algorithm.

Comments?

Thanks,
Anatoly.

-- 
Anatoly Vorobey,
mellon@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]