This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [TAPSETS] Linux Kernel Event Trace Tool
* Guanglei Li (guanglei@cn.ibm.com) wrote:
>
> ----- Original Message -----
> From: "Guanglei Li" <guanglei@cn.ibm.com>
> To: "Mathieu Desnoyers" <compudj@krystal.dyndns.org>
> Cc: <systemtap@sourceware.org>; <jrs@us.ibm.com>
> Sent: Sunday, December 18, 2005 8:59 PM
> Subject: Re: [TAPSETS] Linux Kernel Event Trace Tool
>
>
> >> Mathieu:
> >>Just to throw some ideas :
> >>
> >>Which protection against concurrency between multiple data sources
> >>do you use
> >>(SMP, process vs interrupt) ?
> >>
> >>I have looked at the CVS SystemTAP code : the _stp_printk function
> >>is all but
> >>atomic.
> >
> >[...]
> >Anyway, I think systemtap itself should handle the concurrency
> >issue. Am I right?
> >
> >
> after thinking Mathieu's idea again, the touble may happen if a probe
> handler makes mutiple calls to _stp_printf and during its execution,
> it is migrated to another CPU.
> _stp_printf will get the current CPU id every time it is called.
>
This will happend on systems where preemption is enabled if you do not disable
preemption. You could save the cpu_id() once for the current logging, and then
keep the same until the end. The side-effect is that, sometimes, a CPU will
write in another cpu's buffer.
> If calling _stp_printf only one time in the probe handler, such issues
> should be avoid. Am I right?
>
If preemption is not disabled or interrupts disabled, you will have issues with
_stp_printf.
RelayFS does specify that it's up to the client to deal with locking. From the
_stp_printf function :
runtime/print.c:
#ifndef STP_RELAYFS
....
#else
#define _stp_printf(args...) _stp_sprintf(_stp_stdout,args)
#endif
runtime/string.c:_stp_sprintf()
Think that :
this function can be migrated between cpus (no preempt_disable)
this function can be reentered by a different thread (no preempt_disable)
this function can be reentered by a softirq handler (no bh_disable).
this function can be reentered by an IRQ handler, if they are probed (no irq
disable)
this function can be reentered by an NMI, if they are instrumented.
void _stp_sprintf (String str, const char *fmt, ...)
{
int num;
va_list args;
if (str == _stp_stdout) {
int cpu = smp_processor_id();
char *buf = &_stp_pbuf[cpu][STP_PRINT_BUF_START] + _stp_pbuf_len[cpu];
int size = STP_PRINT_BUF_LEN -_stp_pbuf_len[cpu] + 1;
/* This does not work with multiple writers :
* _stp_pbuf_len might have changed. */
va_start(args, fmt);
num = vsnprintf(buf, size, fmt, args);
/* You write to a buffer position.. while doing that, you might be
* interrupted by the sources listed above. If you didn't increment
* the _stp_pbuf_len[cpu] before, then you will get corruption. */
va_end(args);
if (unlikely(num >= size)) {
/* overflowed the buffer */
if (_stp_pbuf_len[cpu] == 0) { /* value could have changed... */
_stp_pbuf_len[cpu] = STP_PRINT_BUF_LEN;
_stp_print_flush();
} else {
*buf ='\0';
_stp_print_flush();
va_start(args, fmt);
_stp_vsprintf(_stp_stdout, fmt, args);
va_end(args);
}
} else {
_stp_pbuf_len[cpu] += num;
/* The position is only incremented after the write. Furthermore,
* it's not an atomic operation : if it's interrupted between the
* read and the write, you will lose an event. */
}
} else {
va_start(args, fmt);
num = vscnprintf(str->buf + str->len, STP_STRING_SIZE - str->len, fmt, args);
/* once again, not reentrant for str. */
va_end(args);
if (likely(num > 0))
str->len += num;
}
}
This is just to show that _stp_sprintf is not reentrant as said. Or maybe did I
miss an interrupt disable taken before that ? (it couldn't ensure NMI
reentrancy, but it's better than nothing)
What LTTng actually does is to provide a secure way to log to RelayFS buffers
from _any_ context, without doing anything more than preemption disabling and
atomic cmpxchg buffer space reservation.
> and after looked into the generated c code (stap -p3), I saw that
> _stp_print_flush() will be called at the end of every probe handler.
> the comments of _stp_print_flush() says this is a must. But each probe
> handler of my tapsets will print only a single row of data. calling
> _stp_print_flush() for every single line of data will be expensive.
> Any way to avoid this? such as not calling _stp_printf at end of probe
> handlers. The print buffer of each cpu is 8K which could be able to
> holds lots of lines of data.
>
>
What I see of the _stp_sprintf code shows me that it's flushed when the buffer
is full. I didn't look much further though, so the implementers should answer
better than me to this question.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68