This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Looking for recommendation for using SystemTap


Tony Reix wrote:

Hi,

I'm having several Oopss while running tests of an application which
has:
- one patch applied to the kernel
- one kernel module
The analysis of the Oopss clearly show that "someone" writes strings
(like "ata" or "ejbo") randomly in memory and destroys links in
structures, like vmlilst used by get_vmalloc_info in fs/proc/mmu.c or
ulp->proc_list used by loop_undo in ipc/sems.c .
Maybe my code is the culprit, or not.

Do you think SystemTap can help me finding the culprit ?


Yes, it can help you narrow down the areas to look for.

If yes, do you have recommendations and proposals about how to use
SystemTap for that goal ?


A general proposal below.

Can you point me to documentations providing the basic for using
SystemTap in real ?


Folks have used SystemTap to solve real life problems so it is ready for use. The main documentation you need to look at is the tutorial on the web and man pages that come with the install. If you don't find that sufficient to get going feel free to send a note in the mailing list and we will help you ASAP.

Coming to your problem, looks like you are observing a memory corruption. We don't have the watch point feature implemented yet where you could say write the contents of a data structure when ever someone modifies contents located at an address.

With the current features we have my suggestion would be the following steps
1) Identify the common code paths you take when you run your work load
2) Install few probes in that code path
3) The probe handler should printout the contents of the pointers you suspect are getting corrupted. If you have a global that is getting corrupted it is even easier.


The goal of the above exercise is to first narrow down the places where the corruption is happening.
If this doesn't work you could write probe that fires periodically after every few milli seconds and dumps the contents of suspected data structures. Here is the skeleton for timer probe


probe timer.ms(5000) {
       /* code to print your data structures */
}


Hope this helps.


Thanks,

Tony






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]