This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Adding systemtap probe points in pthread library


Firstly, you need to consult the experts on whatever MUA you are using on
how to get code and patches into your messages unmolested.  Code samples in
discussion need to be legible, and patches need to be applyable.  We prefer
plain old-fashioned unmolested text rather than attachments, but an intact
attachment is better than a mangled paste.

I don't see any reason for the extra pile of individual macros.
This needs to be conditionalized in the libc code anyway, and
there is no reason not to use just one universal macro.

I've made a new branch in glibc git, roland/systemtap.  That has configure
and macro magic for --enable-systemap that fits into the libc way of doing
things.  I added just one of the libpthread probes to demonstrate using the
LIBC_PROBE macro.  You can start from that branch and add your other probes
using this new macro.

As to probes in assembly code, that is not a particular problem on the libc
side.  That is, we're going to scrutinize extremely closely any probes of
any kind and look concretely at the instruction-level effect they have,
whether in compiled code or assembly code.  As to the invasiveness in the
source, that's not a problem if it's only with one simple, clean macro, and
in a reasonable and worthwhile set of places, irrespective of whether it's
.c or .S files that are touched.

libc code where you might need probes includes both pure assembly (.S) and
inline asm in C files.  There are no sys/sdt.h macros to support defining
probes in either of these situations.  There certainly could be (it's just
an issue of more macros to emit the same .probes details in slightly
different ways), but that needs to be implemented in systemtap first.
(We can discuss that on the systemtap list with my other hat on.)

Now, as to the subject of overheads.  The subject of where to place probes
so they are hit in hotter or cooler paths is really an issue for the
interests and experiences of people doing the tracing.  What we will
scrutinize most closely (and first) is just any overhead for no probes
being enabled at runtime.

Those gross application benchmarks and focused microbenchmarks are good to
do, and indeed we'd react badly if they showed any slowdowns with probes
disabled.  But libc is used in so many different places (everywhere,
really) that you can never really test a "representative" sample.  Both of
those tests particular ignore process startup time, for example.  But we
know that for the system overall, tiny changes in dynamic linker startup
time can make a real difference.  So we're going to look at the exact
changes to the generate code and judge them as acceptably minimal or not
at the lowest level.

Just to start with, my branch has just the one probe point
(pthread_start).  Using systemtap-1.3's sys/sdt.h (i.e. sdt-v2),
I built libpthread with and without --enable-systemtap to see
the effect of only that one simple probe point.  

Indeed it added just the one byte "nop" to the actual instruction stream.
But it did perturb the compiler's code generation slightly.  In fact, it
turns out the text comes out a little smaller.  But more analysis is
required to see if the code is better, worse, or indifferent.  (One
difference is it decided not to emit an alignment directive in one place.
I have no idea why.  The other differences are things like register
selection.)

However, it added four new dynamic relocs in the .probes section data.
Those cost time at startup (though using prelink compensates for that).
At four relocs per probe point, these really add up.  The acceptable
number is zero.  The data segment bloat for .probes being there is of
concern too.

It's not really hard to rework the sdt.h scheme so that it won't ever
generate dynamic relocs, and it needn't actually have any runtime memory
footprint of any kind (aside from the one nop per probe in the text).
(We can discuss that too on the systemtap list with my other hat on.)
But before that's implemented in systemtap, the libc maintainers would
certainly not recommend to any system builders that they enable these
probe points in performance-critical code like libpthread.  That won't
necessarily hold back merging the optional probe points into the source.
But it might--we'll probably want to just wait and see a new optimal and
assembly-friendly version of the sdt.h macros before we put anything in
mainline libc at all.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]