This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Concurrent linking plan for gold


Here is a plan for implementing concurrent linking for gold.  This
permits the linker to run in parallel with compilation, thus avoiding
the long link delay at the end of building a large program.  I would
be happy to hear any comments on this.

Ian


Goal

Concurrent linking means linking in parallel with compilation.  As
object files are compiled, they are provided to the linker.  The
linker links them into the executable while other compilations are in
progress.  The link completes shortly after the final compilation is
complete.


Interface

We want to make it possible to use concurrent linking in conjunction
with make.  We also want to make it possible to use it with newer
build systems.  The linker must deal with inputs in two different
orders.  The first is the order in which they appear on the command
line, which determines symbol resolution.  The second is the order in
which the files are available to be linked.

The new option --start-concurrent tells the linker that the input
files which follow on the command line should not be linked until they
are available.  The option --end-concurrent restores the normal
behaviour.

We need a mechanism to tell the linker when files are available.  We
support two: inotify for systems which support it, or a pipe for all
systems.

Inotify interface

The option --concurrent-inotify will direct the linker to use an
inotify based system.  With this system, the linker will use inotify
to watch the files named on the command line.  A file will become
available when:
    1) it is closed after being opened for write access, or
    2) the file was not opened, but the modification time is changed.

When the linker is invoked with the --concurrent-inotify option, it
will initialize a set of inotify watches, and then put itself into the
background.  The foreground process will print a string on stdout and
then exit.  Thus when this initial process exits successfully, the
build system may begin building the concurrent input files.  As each
file is either closed or touched, the linker will pick it up and
include it in the link.

When all the input files have been created, the linker should be
invoked again, passing only the option --concurrent-wait=ARG, where
ARG is the string printed by the first linker invocation (or maybe ARG
should be the name of a file).  This invocation of the linker will
contact the background process, and will exit when the link is
complete.

The intention is that a make based system could work roughly like
this:

foo-concurrent-link: foo.c bar.c
	$(CC) -o foo -Wl,--concurrent-inotify -Wl,--start-concurrent foo.o bar.o -Wl,--end-concurrent > foo-concurrent-link

foo.o: foo-concurrent-link foo.c
	if test -n "$(filter-out foo-concurrent-link,$?)"; then \
	  $(CC) -c foo.c; \
	else \
	  touch foo.o; \
	fi

bar.o: foo-concurrent-link bar.c
	if test -n "$(filter-out foo-concurrent-link,$?)"; then \
	  $(CC) -c bar.c; \
	else \
	  touch foo.o; \
	fi

foo: foo.o bar.o
	$(LD) --concurrent-wait=`cat foo-concurrent-link`

I wrote using the GNU make filter-out function, but it could have been
written using sed instead to work with any standard make.

When one of the input files is found via the -L options, such as an
archive specified with -l, the linker will use directory watches to
look for the file.

It will probably be desirable to have a --concurrent-wait option which
tells the linker to abort and not update the output file.


Pipe interface

The pipe interface uses a named pipe rather than inotify.  The option
--concurrent-pipe=PATHNAME directs the linker to use a named pipe.
The linker will create the pipe if it does not already exist.  As with
--concurrent-inotify, the linker will put itself in the background,
and the foreground process will print a string.

As each concurrent input becomes available, the build procedure must
write the name to the named pipe.  The name should be written as a
null terminated string.

When all the input files have been created, the linker should be
invoked again with the --concurrent-wait option, as described above.
If any files have not been reported via the named pipe, they will be
assumed to be complete at this time.

Possible make implementation:

foo-concurrent-link: foo.c bar.c
	mkfifo foo-pipe
	$(CC) -o foo -Wl,--concurrent-pipe,foo-pipe -Wl,--start-concurrent foo.o bar.o -Wl,--end-concurrent > foo-concurrent-link

foo.o: foo-concurrent-link foo.c
	if test -n "$(filter-out foo-concurrent-link,$?)"; then \
	  $(CC) -c foo.c; \
	fi
	echo -ne "foo.o\000" > foo-pipe

bar.o: foo-concurrent-link bar.c
	if test -n "$(filter-out foo-concurrent-link,$?)"; then \
	  $(CC) -c bar.c; \
	fi
	echo -ne "bar.o\000" > foo-pipe

foo: foo.o bar.o
	$(LD) --concurrent-wait=`cat foo-concurrent-link`
	rm -f foo-pipe

A variant of this approach would be to send the complete object
contents on the pipe, rather than just the name.  This might be useful
in some scenarios, particularly if we can have the pipe be a socket
instead.  It does not significantly change the internal
implementation.


Internal implementation

One very simple way to implement this is to have the linker work as it
normally does, but, before it opens a file, wait until the file is
known to be ready.  This permits the linker to set up and read the
startup files (crt1.o, crtbegin.o) in parallel with compilation.  If
the build process arranges to process the files as much as possible in
the order in which they are provided to the linker, there should be
significant overlap in linker processing and compilation.

The more general case, in which the linker can process the files in
any order, is more complicated.  Here is a sketch of how it might
work.  Naturally these operations are all conditional on doing a
concurrent link.  This more general approach will not work when using
a linker script.

We break the serialization for reading the symbol table in
queue_initial_tasks.  Instead of having each Add_symbols task block
the next one, we let them run in any order, like the Scan_relocs
tasks.

As we add the symbols for an object, we note whether we have already
added the symbols for all preceding objects.  If we have not, the
defined symbols are marked as tentative.  When doing symbol resolution
against a tentative symbol, we need to compare the object associated
with the tentative symbol with the object we are processing; if the
object we are processing precedes the object associated with the
tentative symbol, we resolve as though we had seen the symbols in the
reverse order.

Each Add_symbols task kicks off a Read_relocs task right away; we
don't want to wait for queue_middle_tasks.  In Scan_relocs when we see
a relocation against a tentative symbol we simply record it in a table
to be scanned later.

We remove the synchronization point at Layout::finalize.  Instead,
each Scan_relocs task starts a Relocate_task.  Before we do any
Relocate_tasks we guess at segment sizes and file positions.  We have
three segments: text, data, and bss.

Each Relocate_task puts sections into the relevant segments.  If they
will be too large--i.e., if they will overlap with the following
segment--the Relocate_task starts a new segment later in the file.
When the Relocate_task processes relocations, any relocation against
an undefined or tentative symbol is recorded in a table.  Note that
this means that only one Relocate_task process can be run at a time.

The relocation tables used by Scan_relocs and Relocate_task can be
associated with a symbol.  When a symbol gets a non-tentative
definition, we can scan and process the relocations at that time.
Doing it this way rather than doing them all at the end should permit
some concurrency in their processing.

When we've scanned the relocations for the last object, we can build
the GOT and PLT sections, and put them in the appropriate segment.
This will be handled via a general mechanism in Output_section_data:
we will add a flag to indicate that the Output_section_data must not
be written out until all input files have been seen.  We will then
walk through the list at the end of the link, assigning file positions
for the remaining Output_section_data entries and writing them out.

The end symbol will come at the end of the last segment, which may or
may not be a bss segment.  If symbols like etext or edata or bss_start
are referenced, we should issue a warning, as they may not have the
expected meaning when there are multiple segments.

Note that this will require a bunch more memory for the tables of
relocations which require further processing.  We could save some
memory by keeping two bitmaps per input file: one for the relocations
which still need to be scanned, one for the relocations which still
need to be processed.  This may not be an issue in practice.

Note that with this approach sections will not necessarily be
contiguous.  This means that we need special care for certain sections
which must be contiguous, such as .ctors, .dtors, or any section for
which there is a reference to a __start or __stop symbol (these are
the sections whose names are representable in C).  In such a case, if
the linker runs out of room for a section, it must move the entire
section.  This means in turn that the linker must remember every
relocation against such a section.


Interaction with incremental linking

We would eventually like to implement concurrent incremental linking.
Assuming the existence of incremental linking, the concurrent aspect
is straightforward: as each input file becomes ready, we treat it as a
single incremental object.  We repeat this for each concurrent input
file.

One issue is that incremental linking will look at the timestamps of
the files to decide whether they need to be adjusted.  This does not
work well with the proposed concurrent inotify interface, in which the
timestamps are changed.  This suggests that for an incremental
concurrent inotify link, if we are looking for FILE.o, we also look
for FILE.o.concurrent.  If FILE.o.concurrent changes, then we check
the timestamp of FILE.o itself to see whether to include it in the
incremental concurrent link.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]