This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFD] New binutil 'objsplit'



      Afternoon all,


  As we all know, there are plenty of targets out there that don't implement
-gc-sections in the linker.  It seems to be particularly the case for small
embedded CPU targets, which is a shame, because they are the most likely to
be running with very limited memory resources such that -gc-sections might
be a real help.

  However, every port correctly excludes unreferenced archive members from a
final link.

  So it occurs to me to try and leverage this into proper garbage
collection.  At first glance this might seem to necessitate only placing one
function or data object into each source file so that you can get each one
into a separate object file, which object files could then be built into an
archive.  Or it could be done in the same fashion as libgcc, where every
function is wrappered in #ifdef and the same file is repeatedly recompiled
into separate objects.

  Alas, that sounds way too much like hard work to me.  It takes masses of
build system infrastructure.  You have to put ifdefs around everything.  And
repeatedly recompiling the file is dead slow.

  Gcc comes to the rescue, kind of, with the -ffunction-sections and
-fdata-sections options, which place each function or data object into their
own uniquely-named sections.

  All we need now is a way to build an archive where each of those sections
is entered as a separate archive member.

  At first I looked at modifying ar to do this, but it didn't seem too easy
to fit into the structure of ar, and there wasn't a whole lot of useful
functionality in ar that would help me.  Then I noticed that objcopy has
options for keeping or removing sections on-the-fly in the course of copying
the file.  So I decided to use objcopy to separate each of the sections in
an object file into individual object files, which could then be archived by
a standard version of ar.

  Now, there turned out to be problems with this.  Objcopy is clever enough
when keeping or removing sections to garbage collect the symbol table and
only retain referenced symbols.  That's not quite right all the time,
though.  Particularly for ELF files, where you lose things like the
zero-absolute symbol and any of the section symbols that aren't referenced
by (pc-relative) relocs.  You also have a problem with intra-section relocs:
things don't work well if you try and keep a section symbol and relocs
against that symbol for a section that doesn't exist in the object file.

  IOW, the output file produced from "objcopy --only-section=.text" isn't a
valid and well-formed ELF.

[  In fact, this functionality is even more dubious than that.  It goes
through the symbol table, copying entries into a new symbol table where
needed and skipping them where not needed.  Then it creates an output bfd
made with this new symbol table, but with the old relocs from the original
input bfd, which still refer to the symbols in the input bfd's original
symbol table.  I don't know if that will DTRT or GIGO, but it seems unlikely
to be correct; I haven't actually been using or testing any of this
functionality in objcopy because I could already see it wasn't going to work
for my purposes anyway.  ]

  So the solution was a hacked-up version of objcopy, which I call objsplit.
Given a file with a bunch of sections in it, it iterates over the sections,
outputting each section to a separate file.  To get the symbol table right,
it has to 

a) include a zero symbol
b) have a section symbol for the section that is being kept
c) retain all the ordinary defined symbols contained in the section
d) retain all the undefined symbols referenced by relocs against the section
e) have a new symbol added, named after the section, which is just an
ordinary defined symbol, not a section symbol
f) have copies of all the section symbols from the *other* sections in the
file, but converted into undefined references, and with names that match the
sections.

  Then it has to process the relocs, finding any that are against section
symbols from sections that aren't going to be included in the output file,
and fix them to refer to the new undefined section-named symbols we've
created.  That way we don't have object files with relocs against
non-existent sections; they all get converted to relocs against ordinary
symbols, and we ensure that each object file has a symbol that points to the
start of the section and has the same name as the section for the relocs in
all the other sections that we've split out.

  So, there's a major hunk of functionality over and above what a simple bit
of scripting could achieve with objcopy.  I threw it together over an
afternoon last weekend, and it works.  I successfully built my application
(using function and data sections), split the object files into individual
sections, added them to archives, linked the whole app together, and got a
working and valid output with various unreferenced bits left out.

  Now, the points for discussion/advice:

1)  First and foremost, does anyone think this functionality might be of any
interest or use to anyone except me?

2)  I don't know if any of this matters for non-ELF files; it may well be
the case that for simpler formats (eg COFF or AOUT) "objcopy --keep-section"
would indeed DTRT.  Does anyone else know off the top of their head?

  I made it by hacking about the objcopy source, throwing away a lot of
stuff I didn't want and adding functions as needed, so that leads to:

3)  Should it be reintegrated into objcopy, as a new option, rather than
implemented as an entire new tool?  It would be possible, certainly, but I
feel that a) objcopy is overloaded enough as it is, and b) a copy should be
a one-to-one operation and this is a one-to-many, so it doesn't really fit
the same conceptual scheme.

4)  Or should I integrate it into ar, despite the fact that doing so would
basically involve large chunks of code duplication from objcopy into ar (all
the symbol table gc mechanics, frex) with all the grief that implies.  [ Can
you tell I don't like this idea?! ]


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]