This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Systemtap vs Dtrace web page corrections.

From: "James Dickens" <jamesd dot wi at gmail dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: systemtap at sources dot redhat dot com
Date: Wed, 13 Sep 2006 13:51:55 -0500
Subject: Re: Systemtap vs Dtrace web page corrections.
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Utkjl1IXjiCrrx65DnT5Il4CNFcsKsfBmgwTMnH9ev6nP1nWQtqpcLaXVxHPF+JkIKrejhLyDcSVIJzizETP6CIvjqUlNus1KPOVMEy+b3/oec0wWJHvQ2a+3gxdsgWZVk6RV6wpP7UpCiq0cL3n+aMZa4+UGoQJZg634SrIk6k=
References: <cd09bdd10609122024l24ee46eax32feac2d5f374526@mail.gmail.com> <y0md59zlw53.fsf@ton.toronto.redhat.com>

On 13 Sep 2006 09:43:36 -0400, Frank Ch. Eigler <fche@redhat.com> wrote:

James Dickens wrote:

> [...]
> Kernel Lock in:
> [...] Systemtap isn't even stable enough to run a complex script written 6
> months ago

That claim is at odds with a number of scripts in the test suite that
have survived unmodified for over a year.  Perhaps you would care to
share yours?


> So on this line, it would be that Systemtap has the greater amount
> of lock in, not just in the kernel but Systemtap it self and
> possibly even the tools used to compile the script.

You misunderstand the reference to 'lock step" or "lock in".  The
issue is the extent to which systemtap requires cooperating patches in
the linux kernel.

since DTrace is included in all releases of Solaris and Opensolaris
this lock in is not an issue, the user simply executes his code. He
never has to patch or recompile his kernel to include the necessary
modules.

If the writer of the script is using/probing public interfaces of
Solaris kernel, his script will continue to run well into the future.
Solaris 10 and OpenSolaris, currently runs binaries that were released
over 10 years ago, for Solaris 2.5.1, it seems reasonalble that the
same level of compatibility will be maintained for code and scripts
written for Solaris 10.

Many of your other paragraphs similarly mix or confuse concepts such
as maturity, utility, capability.  The purpose of the wiki page is
specifically to pry apart the issues.


> [...] Your scripts are basically C code modified to make it
> compatible with aC compiler.  If one turns on guru mode, you are
> writing 100% C code, no way to consider it anything but C.

You misunderstand the scripting language and its implementation.
dtrace parses its language and generates byte codes; systemtap parses
its, and generates C code.  They are both full-fledged language
translators.

unless its in guru mode then all code between  %{  and  %}  are passed
directly to the compiler. Which will become necessary more often as
tapsets will not availible for userland applictions being probed and
they may not have debugging information as well.

Often a bug will not occur in a problem is compiled with debugging information.

On the execution side, dtrace has a bytecode interpreter, while
systemtap uses native code so no interpreter is needed, but that is
not relevant to the nature of source language.


> Control structures:
> Systemtap has full control structures as stated, but if a bug happens
> in the Systemtap script it can cause the box to crash.
> DTrace: doesn't have functions or loops per se, but you can work
> around this with a little thought.  [...]

You are mixing things up.  Systemtap control structures have nothing
to do with crashing boxes.  Primitives like conditionals, loops,
recursion are translated to C code that includes protection against
accidents like infinite loops or recursion.  The "halting problem" is
a red herring.  I am aware of no bug related to control structure
capability appearing in many months.

one example of a systemtap script in guru mode(necessary to facilitate
following a pointer in a kernel module or userland program without
debugging information) that is searching for the end of a linked list

struct records
    {
    int c;
    struct *next ;
    } p;
p=record_set;
while ( p->next )  { n++;  }
printf ( "record_set has %d records\n", c) ;

in the case where the next pointer is corrupted or points record in
the list thus never end reaches a null pointer, how would systemtap
deal with this?


> Variable Typing:
> [...] Because by implicitly setting the type of the variable you are
> now locked into that type, if the code you are probing changes a
> variable type your script no longer works. [...]

An example would help me understand the kind of change you envision as
posing a problem, and how dtrace would deal with the same issue.


The easiest example is probing a function call that you do not know
what types the arguments are. Just printing the value passed into the
function could  give the user insite as to how the program works.
Useful in reverse engineering a proprietary driver.

since dtrace ignores the argument type until told to cast it or read
its value, it has no problem with this case.

fbt:somemodule::entry { printf ( "arg1= %ld arg2=%ld\n", arg1, arg2); }

It also comes into play when you are working with a precompiled script
trying to probe a function thats prototype has changed since the
script was last compiled.


> When Systemtap gets userland probes its inferred variable typing will
> become even more of a hindrance, lots of programs don't ship with
> debugging code embedded, so if you don't have the source code to
> recompile with debugging information Systemtap is useless, with DTrace
> it will try and make guesses at the data structure and include files
> and allow the user to type cast variables as needed, Systemtap does
> not have the native ability to process include files or handling data
> of unknown types.

It would be easier to answer your writings if they were broken into
sentences with individual propositions.  Many of your ", so" splices
don't actually logically follow..

sorry it was late, I should of waited and re-read it before posting.


> Complex Reports:
> If Systemtap's report generating ability is so great why is there work
> done on a dashboard that is designed to make the reports look better?

Perhaps it's for the same reason that dtrace benefits from a graphical
front-end.

> What limitations are you seeing in DTrace's report generating capability?

One appears to need post-processing perl scripts to generate anything
complex, like for example if data from more than one array needs to be
grouped together.

http://mail.opensolaris.org/pipermail/dtrace-discuss/2005-November/000602.html

shows the combining of multiple agregations.

> Speculative tracing:
> Systemtap: you can't just wish this requirement a way, you can't judge
> whether or not you need a piece of data, in a complex system when the
> first event occurs, you have to store it until possibly many other
> events have occurred [...]

In the systemtap script language, one can implement the gist of this
from first principles: store that piece of data in global variables,
then emit actual trace messages later when the data is known to be
relevant.  One variant of this is an exercise in the tutorial.

from the dtrace guide:

For example, if a system call is occasionally failing with a common
error code (for example, EIO or EINVAL), you might want to examine the
code path leading to the error condition. To capture the code path,
you could enable every probe — but only if the failing call can be
isolated in such a way that a meaningful predicate can be constructed.
If the failures are sporadic or nondeterministic, you would be forced
to trace all events that might be interesting, and later postprocess
the data to filter out the ones that were not associated with the
failing code path. In this case, even though the number of interesting
events may be reasonably small, the number of events that must be
traced is very large, making postprocessing difficult.


> Number of probe points:
> System tap, has too much bloat to have a truly unlimited number of
> probes [...] 1 million active probes requires 64MB of
> kernel space allocated [...]

You conflate the number of potential vs. actually activated probes.

Systemtap is currently limited to kenel probes, constructing sane
examples with large number of probes is not possible, when userland
probes are availible many more probes are possible and even necessary.
Probing every functions entry and exit points of Mozilla equals about
500,000 probes. Will systemtap be able to handle this? How about the
case where there are multiple large scripts running on the same
system. Or where one script is probing every function in a desktop
environment such as KDE, running a few core apps.

> What is the highest number of Systemtap probes ever activated in an
> active script with out the machine falling over?  In my tests of
> DTrace it is over 500,000.  [...]

This is not sufficiently constrained to be answered.  One can place
millions of probes that are never hit - or one probe that is hit
millions of times per second - or somewhere in between.

Since probing every function in the kernel lead to this bug #2685,
http://sourceware.org/bugzilla/show_bug.cgi?id=2685 where even
something as simple as

probe begin { log("begin") }
probe kernel.function("*") { print(".") }
probe end { log("end") }

this script hung x86_64 systems, shouldn't your  protection code
prevent the machine from being hung?


> Of course a good question is, what kind of kernel coder can't debug
> a problem when he knows what functions are being called, and by what
> function, how often and how much time its taking, and what functions
> it calls and complete userland and kernel stack tracing. Seems to be
> a pretty silly to do all that extra work and risk system stability
> in the name of probing every line in the kernel.

Practicing software engineers tend sometimes to put debugging code
such as printfs into places other than the beginning or ends of
functions.  Talk to one to find out why.

I know why, and have added them to my own code, but I remove such code
when ready for production use. Just because systemtap will be able to
probe any line in the kernel doesn't mean it will be used, because its
easier to just to add printf's to the souce code and you wont have to
debug a systemtap script in the process. In general if you can't
follow your logic and you know all of the above, you should probably
rewrite the code anyway.

> [...] Can Systemtap still probe arbitrary points in code if you
> don't have a binary with debugging information intact?
The granularity will likely be smaller - something closer to dtrace.

in userland, dtrace can probe any assembly language instruction.

From the dtrace guide:

Function Offset Probes
The pid provider lets you trace any instruction in a function. For
example to trace the
instruction 4 bytes into a function main(), you could use a command
similar to the
following example:
pid123:a.out:main:4

Every time the program executes the instruction at address main+4,
this probe will be
activated. The arguments for offset probes are undefined. The uregs[] array will
help you examine process state at these probe sites. See "uregs[]
Array" on page 352
for more information.


> Concurrent probes on multiprocessors:
> [...]
> You can even run multiple copies of the same script with out
> problems, last time I tried this simple test on Systemtap, it
> failed.


While this has nothing to do with multiprocessors, I am aware of no
such problem with systemtap.  Care to share your script?

this one seems to be a good place to start....

probe begin { log("begin") }
probe kernel.function("*") { print(".") }
probe end { log("end") }

here is the same thing in dtrace, with a probe that fires after 1
minute exiting the script, I successfully ran 10 concurrent instances
of it, with the output being sent over a ssh link thus each instance
was generating more probe fires, dtrace did kill off one of more
instances of the script running because of the work load made the
system unresponsive, but the machine did not crash and 2 minutes later
the machine returned to normal.

for i in 1 2 3 4 5 6 7 8 9 10 ; do dtrace -qs kernelfunction_probes.d & done

kernelfunction_probes.d
:::: {printf(".");}

tick-60s { exit(0); }

I have tested the dtrace version both on  single proccessor systems, 2
proccessors and even one with 8 cores that handle 32 threads
concurrently, so it tests all aspects of dtrace and any locking that
may occur with in probe fires.


> [...] DTrace also understands the C struct construct so it can also
> read data stored in structs and use pointers stored in structs to
> access other data even if the program isn't compiled with debugging
> information. Systemtap requires debugging information to access data
> in structures even if the user knows the layout of the data
> structures. [...]

Without debugging information, this would require access to and
parsing of exactly matching header files from the source code.  Are
you aware of the complications involved in this - and the likelihood
of producing garbage data (albeit safely) for mismatches?  In any
case, nothing precludes future inclusion of this sort of function.

as a developer, I can send the user a script with a struct embeded in
it, the script could then access members of a struct, and have him run
it and return the data, as a developer the majority of my programs
will be shipped without debuggging information, yet I can still use
the power of dtrace to diagnose the  problem.

How many in formed users will be willing to run a program with guru
mode enabled on a production server? that is what is currently
necessary to achieve the same ability in systemtap, on a program that
lacks debugging information.

James

- FChE

References:
- Systemtap vs Dtrace web page corrections.
  - From: James Dickens
- Re: Systemtap vs Dtrace web page corrections.
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]