This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug kprobes/2062] Return probes does not scale well on SMP box
- From: "jkenisto at us dot ibm dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sources dot redhat dot com
- Date: 7 Jul 2006 23:32:41 -0000
- Subject: [Bug kprobes/2062] Return probes does not scale well on SMP box
- References: <20051216010933.2062.anil.s.keshavamurthy@intel.com>
- Reply-to: sourceware-bugzilla at sourceware dot org
------- Additional Comments From jkenisto at us dot ibm dot com 2006-07-07 23:32 -------
(In reply to comment #15)
> Created an attachment (id=1147)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=1147&action=view)
> testing data Jim's patch
>
> Hi Jim,
> I manually applied your patch to 2.6.17.4/ppc64, and tested on a 8-way ppc64
> box. I use a multi-thread app which will create 8 threads and each thread will
> call getsid() in a loop. The tests shows that the kretprobes still doesn't
> scale well on SMP box. Here is the results:
>
> <1> without being probed by stap:
...
> Total cpus: loops = 40000000, average = 5822 ns
>
> <2> probed by 'stap -e "probe syscall.getsid {}" -bM'
...
> Total cpus: loops = 40000000, average = 7688 ns
>
> <3> probed by 'stap -e "probe syscall.getsid.return {}" -bM'
...
> Total cpus: loops = 40000000, average = 25277 ns
>
This is troubling. A kretprobe is more expensive than a kprobe, but it
shouldn't be that much more. Given uniprocessor performance ratios, I'd expect
numbers in the range of 8,000-10,000 ns, not 25,000 -- if lock contention
weren't a major factor.
What numbers do you get when you run the syscall.getsid.return test with the
"old" version of kretprobes?
> I ever used oprofile to sample the kretprobe, and the sample data is similar
> with the data in this attachment.
>
The high number for .__spin_yield() confirms that there's significant lock
contention, and the high number for ._spin_lock_irqsave() (compared to
._spin_lock()) suggests that the contention is on the hash-bucket locks
(kretprobe_table_locks[]) rather than the per-kretprobe lock. This is a
surprise to me.
My version of stap doesn't appear to create any calls to spin_lock_irqsave() in
the handlers. Could you please verify that that's true for your version? Run
stap with -p3 and verify that there are no calls to spin_lock_irqsave()?
I did some experimentation and verified that hashing on current() provides a
reasonably good distribution of hash indexes. That is, a bunch of tasks
launched together but running concurrently seem to hash to different buckets.
(If you run them consecutively, they tend to re-use the same task_struct and
therefore the same hash bucket... but if they're not concurrent, there'd be no
contention, right?)
A good thing to try next is to factor SystemTap out of the experiment. I'll
attach a .c file that's equivalent to "probe syscall.getsid.return {}".
--
http://sourceware.org/bugzilla/show_bug.cgi?id=2062
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.