This is the mail archive of the
mailing list for the systemtap project.
Re: [PATCH -tip 4/5] kprobes/x86: Use text_poke_smp_batch
Mathieu Desnoyers wrote:
> * Masami Hiramatsu (email@example.com) wrote:
>> Mathieu Desnoyers wrote:
>>> * Masami Hiramatsu (firstname.lastname@example.org) wrote:
>>>> Use text_poke_smp_batch() in optimization path for reducing
>>>> the number of stop_machine() issues.
>>>> Signed-off-by: Masami Hiramatsu <email@example.com>
>>>> Cc: Ananth N Mavinakayanahalli <firstname.lastname@example.org>
>>>> Cc: Ingo Molnar <email@example.com>
>>>> Cc: Jim Keniston <firstname.lastname@example.org>
>>>> Cc: Jason Baron <email@example.com>
>>>> Cc: Mathieu Desnoyers <firstname.lastname@example.org>
>>>> arch/x86/kernel/kprobes.c | 37 ++++++++++++++++++++++++++++++-------
>>>> include/linux/kprobes.h | 2 +-
>>>> kernel/kprobes.c | 13 +------------
>>>> 3 files changed, 32 insertions(+), 20 deletions(-)
>>>> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
>>>> index 345a4b1..63a5c24 100644
>>>> --- a/arch/x86/kernel/kprobes.c
>>>> +++ b/arch/x86/kernel/kprobes.c
>>>> @@ -1385,10 +1385,14 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
>>>> return 0;
>>>> -/* Replace a breakpoint (int3) with a relative jump. */
>>>> -int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)
>>>> +#define MAX_OPTIMIZE_PROBES 256
>>> So what kind of interrupt latency does a 256-probes batch generate on the
>>> system ? Are we talking about a few milliseconds, a few seconds ?
>> From my experiment on kvm/4cpu, it took about 3 seconds in average.
> That's 3 seconds for multiple calls to stop_machine(). So we can expect
> latencies in the area of few microseconds for each call, right ?
But if we register more than 1000 probes at once, it's hard to do
anything except optimizing a while(more than 10 sec), because
it stops machine so frequently.
>> With this patch, it went down to 30ms. (x100 faster :))
> This is beefing up the latency from few microseconds to 30ms. It sounds like a
> regression rather than a gain to me.
If it is not acceptable, I can add a knob for control how many probes
optimize/unoptimize at once. Anyway, it is expectable latency (after
registering/unregistering probes) and it will be small if we put a few probes.
(30ms is the worst case)
And if you want, it can be disabled by sysctl.