This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: stptracer-20060828 has released.
- From: Li Guanglei <guanglei at cn dot ibm dot com>
- To: Li Guanglei <guanglei at cn dot ibm dot com>
- Cc: Masami Hiramatsu <masami dot hiramatsu dot pt at hitachi dot com>, SystemTAP <systemtap at sources dot redhat dot com>, "Jose R. Santos" <jrs at us dot ibm dot com>, Satoshi Oshima <soshima at redhat dot com>, Hideo Aoki <haoki at redhat dot com>, Yumiko Sugita <yumiko dot sugita dot yf at hitachi dot com>, "Frank Ch. Eigler" <fche at redhat dot com>
- Date: Thu, 31 Aug 2006 17:51:55 +0800
- Subject: Re: stptracer-20060828 has released.
- Organization: IBM CSTL
- References: <44F58A51.6060806@hitachi.com> <44F69E3C.10308@cn.ibm.com> <44F6A491.8020008@cn.ibm.com>
Li Guanglei wrote:
Li Guanglei wrote:
[...]
Here you can see that gBTI is around 23% faster than _stp_printf(). In
fact a long time ago I used oprofile to find out where _stp_printf()
cost cpu time, and it showed that a lot time is spent on
_stp_vsnprintf().
Attached is the profiling data I got a minute ago by running
the tests I mentioned in my mail. I only kept the leading few lines of
the generated profiling file.
oprofile.all is generated by:
opreport -p STP_MODULE_DIR -lg > oprofile.all
oprofile.kernel is generated by:
opreport /boot/vmlinux-`uname -r`.debug -lg > oprofile.kernel
- Guanglei
As a comparison, I also paste the profiling data when probing using
lkst_getsid.stp. You can see that _stp_vsnprintf used by
lket_getsid.stp will cost ~51% total time while _stp_binary_write used
by lkst_getsid.stp will only cost ~4.7% total time.
- Guanglei
CPU: ppc64 POWER5, speed 1502.5 MHz (estimated)
Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000
samples % linenr info symbol name
125867 11.6242 (no location information) restore
93078 8.5960 op_model_power4.c:183 .hypervisor_bucket
91577 8.4574 kprobes.c:442 .kprobe_exceptions_notify
88532 8.1762 sys.c:1351 .sys_getsid
64783 5.9829 (no location information) syscall_error_cont
49629 4.5834 (no location information) program_check_pSeries
43473 4.0149 (no location information) single_step_pSeries
41871 3.8669 (no location information) system_call_common
37849 3.4955 (no location information) program_check_common
37300 3.4448 (no location information) system_call_pSeries
32540 3.0052 spinlock.c:66 ._read_lock
28474 2.6297 sys.c:133 .notifier_call_chain
22203 2.0505 (no location information) .ret_from_except_lite
21443 1.9803 (no location information) single_step_common
17651 1.6301 (no location information) .save_nvgprs
16164 1.4928 (no location information) system_call
15142 1.3984 sstep.c:56 .emulate_step
13674 1.2628 (no location information) syscall_exit
13070 1.2071 spinlock.c:154 ._spin_lock
12607 1.1643 traps.c:747 .program_check_exception
12568 1.1607 sys.c:220 .atomic_notifier_call_chain
11305 1.0440 pid.c:209 .find_pid
11031 1.0187 hash.h:33 .get_kprobe
10543 0.9737 (no location information) .ret_from_except
9915 0.9157 (no location information) .__copy_tofrom_user
8693 0.8028 traps.c:502 .single_step_exception
8685 0.8021 op_model_power4.c:191 .kernel_unknown_bucket
7412 0.6845 buffer.c:1433 .__find_get_block
6475 0.5980 pid.c:258 .pid_task
5260 0.4858 buffer.c:1269 .(bool, long double,...)(long, short,...)
4351 0.4018 kprobes.c:121 .set_current_kprobe
4336 0.4004 buffer.c:80 .unlock_buffer
4220 0.3897 relay.c:535 .relay_switch_subbuf
3945 0.3643 locks.c:29 .__spin_yield
3349 0.3093 slab.c:3371 .kmem_cache_free
2929 0.2705 pid.c:271 .find_task_by_pid_type
2853 0.2635 filemap.c:1978 .generic_file_buffered_write
2280 0.2106 slab.c:3145 .kmem_cache_alloc
2029 0.1874 wait.c:241 .bit_waitqueue
1790 0.1653 bitops.c:19 .find_next_bit
1790 0.1653 (no location information) .memset
1747 0.1613 filemap.c:507 .unlock_page
1708 0.1577 fs-writeback.c:56 .__mark_inode_dirty
1692 0.1563 spinlock.c:73 ._spin_lock_irqsave
1660 0.1533 mutex.c:193 .__mutex_lock_slowpath
1649 0.1523 slab.c:2992 .free_block
1546 0.1428 wait.c:208 .__wake_up_bit
CPU: ppc64 POWER5, speed 1502.5 MHz (estimated)
Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000
samples % linenr info image name app name symbol name
125867 8.2421 (no location information) vmlinux vmlinux restore
117896 7.7202 lglzy.c:918 lglzy.ko lglzy .enter_probe_1832
93078 6.0950 op_model_power4.c:183 vmlinux vmlinux .hypervisor_bucket
91577 5.9967 kprobes.c:442 vmlinux vmlinux .kprobe_exceptions_notify
88532 5.7973 sys.c:1351 vmlinux vmlinux .sys_getsid
72270 4.7324 lglzy.c:269 lglzy.ko lglzy ._stp_binary_write
65678 4.3008 (no location information) jbd jbd (no symbols)
64783 4.2422 (no location information) vmlinux vmlinux syscall_error_cont
49629 3.2498 (no location information) vmlinux vmlinux program_check_pSeries
43473 2.8467 (no location information) vmlinux vmlinux single_step_pSeries
42525 2.7847 lglzy.c:165 lglzy.ko lglzy .common_probe_prologue
41871 2.7418 (no location information) vmlinux vmlinux system_call_common
39245 2.5699 lglzy.c:187 lglzy.ko lglzy .common_probe_epilogue
37849 2.4785 (no location information) vmlinux vmlinux program_check_common
37300 2.4425 (no location information) vmlinux vmlinux system_call_pSeries
34824 2.2804 (no location information) ext3 ext3 (no symbols)
32540 2.1308 spinlock.c:66 vmlinux vmlinux ._read_lock
28474 1.8646 sys.c:133 vmlinux vmlinux .notifier_call_chain
23742 1.5547 (no location information) libc-2.3.4.so libc-2.3.4.so .__GI_getsid
22203 1.4539 (no location information) vmlinux vmlinux .ret_from_except_lite
21443 1.4041 (no location information) vmlinux vmlinux single_step_common
17651 1.1558 (no location information) vmlinux vmlinux .save_nvgprs
16164 1.0585 (no location information) vmlinux vmlinux system_call
15142 0.9915 sstep.c:56 vmlinux vmlinux .emulate_step
13674 0.8954 (no location information) vmlinux vmlinux syscall_exit
13070 0.8559 spinlock.c:154 vmlinux vmlinux ._spin_lock
12607 0.8255 traps.c:747 vmlinux vmlinux .program_check_exception
12568 0.8230 sys.c:220 vmlinux vmlinux .atomic_notifier_call_chain
11305 0.7403 pid.c:209 vmlinux vmlinux .find_pid
11031 0.7223 hash.h:33 vmlinux vmlinux .get_kprobe
10543 0.6904 (no location information) vmlinux vmlinux .ret_from_except
10235 0.6702 lglzy.c:326 lglzy.ko lglzy .function_lkst_trace_n
9988 0.6540 print.c:91 lglzy.ko lglzy ._stp_print_flush
9915 0.6493 (no location information) vmlinux vmlinux .__copy_tofrom_user
9830 0.6437 (no location information) app app .thread_func
8693 0.5692 traps.c:502 vmlinux vmlinux .single_step_exception
8685 0.5687 op_model_power4.c:191 vmlinux vmlinux .kernel_unknown_bucket
8635 0.5654 (no location information) oprofiled oprofiled (no symbols)
7412 0.4854 buffer.c:1433 vmlinux vmlinux .__find_get_block