This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Looking for recommendation for using SystemTap


Tony Reix <tony.reix@bull.net> writes:

> [...]
> The analysis of the Oopss clearly show that "someone" writes strings
> (like "ata" or "ejbo") randomly in memory and destroys links in
> structures, like vmlilst used by get_vmalloc_info in fs/proc/mmu.c or
> ulp->proc_list used by loop_undo in ipc/sems.c .
> [...]
> Do you think SystemTap can help me finding the culprit ?
> [...]

Perhaps.  Does the memory corruption occur in predictable places?
Imagine a probe that runs periodically (via a frequently triggered
timer, or a breakpoint at a code point under suspicion).  That probe
could look through selected places that are corrupted, and check for
something suspicious.  For example:

  #! stap -g
  probe kernel.function("after_your_function") { if (checkstuff ()) log ("bug") }
  function checkstuff () /* .... */

What checkstuff() does depends on how a program may be able to assess
corruption.  If it's ascii scripts showing up within known regions of
valid memory, something like this naive search could do it.  (Such a
function could be encapsulated into the systemtap tapset library).

  function checkstuff () %{
    char *begin = 0xdeadbeef;
    char *end = 0xdeadf00d;
    int found = 0;
    char *p;
    for (p = begin; p+3 < end; p++)
      if (p[0] == 'a' && p[1] == 't' && p[2] == 'a') found=1;
    THIS->__retval = found;
  %}

Later, we will have hardware-assisted watchpoint probes that hit when
a designated area of memory is read and/or written.  That could narrow
the culprits down even further.  This might look something lke:

  probe kernel.watch.from(0xdeadbeef).to(0xdeadfood).string("ata")
    { log ("bug") }


Anyway, this all depends on being able to characterize the corruption
well enough that a routine could be written to safely check for it.
If you don't have even that much information, very drastic measures
may be necessary (such as running the kernel under a simulator or
debugger).


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]