This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
beaverton meeting minutes
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- To: systemtap at sources dot redhat dot com
- Date: Tue, 22 Nov 2005 17:33:35 -0500
- Subject: beaverton meeting minutes
Hi -
Here are my set of rambly minutes from last week's group meetings,
hosted by IBM Beaverton. Thanks, guys! This was our second
large-scale face-to-face meeting, so it could be called "2f2f" if
you're into that kind of thing.
2005-11-15
09:00
- meeting startup w/ Jim at the lovely offices of IBM Beaverton (ex Sequent)
- attendees
Red Hat:
Martin Hunt, Will Cohen, Elena Zannoni, Frank Eigler, Graydon Hoare
Hitachi:
Yumiko Sugita, Satoshi Oshima, Hideo Aoki (?)
IBM:
Kevin Stafford, Hien Nguyen, Jim Keniston, Larry Kessler, Vara Prasad,
(by phone:) Ananth Mavinakayan, Prasanna Panchamukhi
Intel:
Josh Stone, Anil Keshavamurthy, Brad Chen
09:10
- jkenisto on kprobes
- kprobes, jprobes, retprobes on 386, x86-64, ia64, ppc64
- no one working on sparc64
- ezannoni: ia64 status? jkenisto: upstream
- ananth: hugemem-vs-gdb kprobes fixes in queue, bz# 171980
- ananth suspects translator at fault in pr# 1836
- jkenisto reviewing kprobes bugs:
- 1345/1808: kretprobes, jkenisto etc. still thinking about cure
- 1813: RCU may fix, have no RH kernel experience; ananth to check
in with RH kernel guys to check on RCU patch inclusion
- 1235: blacklist; kprobes-resident blacklisting already upstream, but
not RH kernels, possibly redundant with translator-side blacklist;
potential backport will miss RHEL4U3 deadline
- RCU patches in U3-candidate kernel, yey; ezannoni to find way to
distribute this to partners
- printk not generally safe: kernel-claimed safety not sufficient to us
- suggest closing 1235, and shove it into translator directly
- 1776: systemtap probes crash; raw kprobes better; details later
- 1303: wishlist item for probe handler crash detection
- re kretprobes stack traceback hygiene
- kprobes future:
- userspace prototyped
- userspace return not started
- "safe" user->kernel data copying still an open issue
- sysrq emergency disarm key: how bulletproof? from pessimal
circumstances? old code sets just a global flag for deferred
disarming; IBM experience indicates recovery from death-throes
very unlikely; hopeless if interrupts disabled anyway
- need testsuite
10:30
- fche gives talk on translator internals
11:10
- graydon talks about stats implementation
- new syntax meets RAVE reviews, people dancing in the streets for some reason
- some code checked in just before meeting, just shy of code generation
11:35
- brad.chen on checking etc.
- used binary rewriting tool "pin"
- tool had problems ... a pinhead, surely?
- new demo based on objdump feeding a perl script
- usefulness iffy as is
- but would be nice to see a list of kernel symbols used from probe context,
to detect printk-like problems
- dtrace safety beyond us by virtue of their restrictions (static probe
points); add analysis task to brad.chen bug #901
- maybe kprobes become guru-only if enough tapsets / static probes come in
- vara recalls old kernel-resident "tapset" concept from spring
- perhaps as means to provide kernel-side kprobe-candidacy whitelist
- q: how to get kernel developers to want to help us instrument their stuff
- issue: distribution/maintenance of tapsets - when kernel version drifts,
who should keep the scripts up-to-date?
- varap: kernel developers want to help, wish instrumentation to live in
kernel source repository
- supply macros for kernel developers to mark up instrument-worthy places
- cost question: how close to zero must dormant probes be to be acceptable?
- to pass data, it'll have to be non-zero cost
11:30
- lunch
- lovely sandwiches, thanks!!
- bug review
- bug 1594: possible kprobes-systemtap adaptation function - kprobe error
- RH compiler bug 169485 still outstanding (gcc4 backporting to gcc3.4)
- bug 1802: use -D MAXINSTANCES
- need %ifarch for e.g. system calls
- 907: would like $userptr->field sort of thing; jkenisto suggests => for
a single hop
- $target->field rework; bring back ${ptr->field.subfield}
- test 907 on RHEL4U2; claimed to work on 2.6.14
- could translator synthesize jprobes?
13:40
- applications
- implement usbmon in systemtap?
- jfs instrumentation
- block layer tapset - Jens Axboe expressed interest
- iostat
- iotop "based on systemtap" - Red Flag Linux
- hitachi raises issue of binary block/data passing; they have some
very high performance probing needed, <1us?
- grayche imagines binary tracing into circular mmap'd buffer
- "porting" dtrace providers
14:00
- wcohen on testing
- review of existing buckets, test types
- bug 1808, still out there
- RHTS lagging behind systemtap development due to beehive rpm insuckage
- varap et al. will start regular testing of RHEL4 vs systemtap snapshots
- wcohen commits to dejagnuizing kprobes tests
- need arch-specific tests, code coverage analysis
- need volunteers for stress-tests
- decision: team will focus on RCU kprobes instead of classic kprobes
14:30
- systemtap demo by Hien to IBM group
- Mingming Cao (IBM ext3 contributor) excited even by iostats.stp
- probing during kernel boot time w/ statically loaded modules
- translator/runtime should perform NUMA memory allocation at load time
- fault injection interesting ($var or $retvalue writing)
- multiplatform interesting
- IBM's official favorite distro is: <no comment>
- caching systemtap modules could cut down compilation time
- code patching interesting, but how would systemtap be useful?
- remote/boot-time probe module injection; static linking of module,
operation w/o stpd
15:40
- djprobes demo
- overhead target: probing at 30 kHz with 1% overhead; 300 ns/probe
- handler must save/restore register state
- copied instructions must be PC-independent / relocatable
- cultural cross-pollination: japanese "foo bar" == "hoge huga"
- gettimeofday benchmark demo: djprobes 40ns, kprobes 520ns (overheads)
- gnuplot formatted output is neat0
- they wrote several beautiful tapsets to replicate 10% of lkst trace points
- technique portable to x86-64, problematic for ppc64 (ToC reg?), ok
for modern ia64 (with atomic-store-16)
- kprobe on top of djprobe latter-day-bytes
- coexistence with kdb
- safety check for jmp+address insertion tricky with preemption/hw interrupts
- eligibility check for PC address tricky; current demos all use function
entrypc, simplifying the situation
- hitachi needs to work out actual algorithm for general safety checking
18:30
- group dinner at big "road house" style restaruant with wonderful
post and beam construction
- thanks Larry!
2005-11-16
09:00
- regrouping, sugary snacks just a few minutes late
- reverting to bug review, wish list, AR
- review of safety: brad.chen to bring over a static checker widget,
no new major efforts until users suggest specific areas
- graydon forsees translator-resident whitelist for letting us flip
switch of general dwarf probes to guru mode
- make $val rvalues guru-mode only at that time; no apparent hazard to
keeping them safe-mode visible at the present
- jkenisto reminds kprobe originally oriented toward expert users
- varap focus on kernel community engagement
- varap reminds of desire for user-space probes; couple of months of work
left
- kretprobes bug 1345 patch submitted
- RHEL5 kernel patch deadlines next spring, impacting IBM despite its
quasi-fictional placement
- jkenisto summarizes user-space probe prototypes
- inode+offset basis, as per dprobes
- script/java targeting pushed by sun
- static instrumentation - not necessarily a dwarf/kprobes thing
- static instrumentation user-space probes could expand to hard-coded
int3 in systemtap shlib
- desire compatibility with instrumentation inserted for dtrace' sake
- RH can investigate this part
- watchpoint probes
- hw support exists in multiple architectures
- problem: arbitration/sharing of hw control registers
- send info to roland as feature request for ptrace-rewrite
10:00
- evangelism, Garrett @ IBM joining us
- <gh@us.ibm.com>
- <customer> CTO had sun/linux competition
- we need more publicity
- need more problem state sharing
- there exist some neat kernel-space problems
- systemtap results could be used as evidence to kernel hackers to justify
fixes, if customer proprietary workloads were not shareable to demonstrate
the problems
- put tool into hands of pre-sales technical types, to diagnose customers'
solaris-to-linux porting problems
- IBM has "captured customers"; more understood problem space, relationship,
less need to dazzle them with linux
- IBM kas kernel subsystem experts, offers constructing tapsets
- but: too many individual problems for general problem patterns
- lkml flame du jour: memory fragmentation
- need lkml problem monitoring, offering up systemtap solutions, awareness
- what obstacles exist for wannabe users?
- probe cross-compilation necessary (#1145)
- integrate elfutils builds into cvs/src systemtap build procedure; open-coding
the rpm bundling logic; include "portability patch"
11:00
- perhaps write IBM Red Book about systemtap
- need "standard demo" of systemtap up on web
- RHEL5: virtualization: xen kprobes
- user-level kretprobes may involve blocking
- djprobes "booster" for kprobes/kretprobes patches coming from Hitachi
- need algorithm to check for safe insertion point
- consensus on not requiring Hitachi to port widget to other architectures
- djprobes eligibility safety checking kernel or more likely
user-space computable
- minimizing api sprawl good
11:30
- action item review [follows in separate email]
- manpower commitment [deferred]
- adjourned