This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug runtime/15982] New: process.end probes broken on RHEL7
- From: "dsmith at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sourceware dot org
- Date: Thu, 26 Sep 2013 21:36:55 +0000
- Subject: [Bug runtime/15982] New: process.end probes broken on RHEL7
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=15982
Bug ID: 15982
Summary: process.end probes broken on RHEL7
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: runtime
Assignee: systemtap at sourceware dot org
Reporter: dsmith at redhat dot com
On RHEL6, the UTRACE_P5_01_cmd subtest of the utrace_p5.exp testcase is failing
somewhat randomly. This test basically does something like the following:
====
# stap -e 'probe process.end { printf("end\n") }' -c whoami
dsmith
====
After lots of debugging, I believe we've got a timing issue.
Here's my current theory. The process has ended, so it sends the process'
parent a SIGCHLD to let it know one of its children has died (via a call to
do_notify_parent()). The kernel then call the utrace hook to let systemtap know
the process has died (via a call to tracehook_report_death()). Here's the
relevant code from exit_notify(), from kernel/exit.c.
====
signal = tracehook_notify_death(tsk, &cookie, group_dead);
if (signal >= 0)
signal = do_notify_parent(tsk, signal);
tsk->exit_state = signal == DEATH_REAP ? EXIT_DEAD : EXIT_ZOMBIE;
/* mt-exec, de_thread() is waiting for us */
if (thread_group_leader(tsk) &&
tsk->signal->group_exit_task &&
tsk->signal->notify_count < 0)
wake_up_process(tsk->signal->group_exit_task);
write_unlock_irq(&tasklist_lock);
tracehook_report_death(tsk, signal, cookie, group_dead);
====
The userspace portion of systemtap, stapio, when it gets the SIGCHLD
immediately turns around and tells the module to quit. Here's the code from
staprun/mainloop.c:
====
pid_t pid = waitpid(-1, &chld_stat, WNOHANG);
if (pid != target_pid) {
return;
}
if (chld_stat) {
// our child exited with a non-zero status
if (WIFSIGNALED(chld_stat)) {
warn(_("Child process exited with signal %d (%s)\n"),
WTERMSIG(chld_stat), strsignal(WTERMSIG(chld_stat)));
target_pid_failed_p = 1;
}
if (WIFEXITED(chld_stat) && WEXITSTATUS(chld_stat)) {
warn(_("Child process exited with status %d\n"),
WEXITSTATUS(chld_stat));
target_pid_failed_p = 1;
}
}
dbug(2, "sending STP_EXIT\n");
rc = write(control_channel, &btype, sizeof(btype)); // send STP_EXIT
====
What I believe is happening is that the module is exiting before the
process.end probe gets a chance to hit. It is more likely that the module's
session state is no longer STAP_SESSION_RUNNING when the process.end probe gets
hit, so the probe gets skipped.
I'm not sure of the best way to fix this. For a system where I have
consistently seen the problem, a 1/4 second sleep in stapio between getting the
signal and sending the STP_EXIT command works around the problem. But that is
an *ugly* workaround...
--
You are receiving this mail because:
You are the assignee for the bug.