This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Systemtap do_filp_open failure on a few linux packages
- From: Henrik /KaarPoSoft <henrik at kaarposoft dot dk>
- To: systemtap at sourceware dot org
- Date: Sat, 15 Jun 2013 23:52:01 +0200
- Subject: Systemtap do_filp_open failure on a few linux packages
Dear all,
I have experienced a very strange issue related to systemtap.
Any insights or help you might be able to provide to help me debug this
further would be most appreciated.
I am developing a linux distribution called KaarPux:
http://kaarpux.kaarposoft.dk/
Using a few scripts, some 600+ linux packages are build and installed.
Generally, this works like a charm.
In order to automatically collect package dependencies, I have created
a small systemtap script to show files opened for reading:
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/kx_open.stp
This script is basically a probe on kernel.function("do_filp_open").return
The script is compiled with
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/install_kx_open_stp.sh
The script is executed with the functions in
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/shinc/linux_functions.shinc
So, basically a "staprun -o $PIPE -c script_to_build_package"
into a $PIPE created previously
If I try to build all the 600+ packages with this probe enabled,
it ALMOST works.
For most of the 600+ packages, building is successfull, and the probe
returns what seems to be reasonable results.
However, for a few packages, building fails:
- firefox
- thunderbird
- libreoffice
- ghc-binary
- ghc
I am a bit puzzeled.
If I have made some stupid beginners mistake, I would have expected all,
most, or at least a significant number of package builds to fail.
But only those 5 out of 600+ fails...
I have experienced similar problems for the last 6 to 12 months with
different kernel versions, systemtap versions, qemu versions, and
KaarPux versions.
So it does not seem to be a glitch with the current version combination.
BTW, I also experienced similar problems with an earlier script:
http://sourceforge.net/p/kaarpux/code/ci/e80f14f67fc7688a4d85661befb2b96a565b206a/tree/master/chroot_scripts/kx_open.stp
I never bothered to debug further, but now I have tried to dig further...
Currently I have:
linux: 3.9.3
systemtap: 2.2.1
firefox: 21.0
thunderbird: 17.0.6
ghc-binary: 7.4.1
ghc: 7.4.1
Host: i7-3970X on P9X79 WS
Virtual Machine: qemu kvm 1.5.0
When building firefox with and without systemtap,
I get 36000+ identical lines in the log (except for some build identifiers),
then with systemtap:
---------- [BEGIN] ----------
Executing: c++ -o plugin-container -Wall -Wpointer-arith
-Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body
-Wno-invalid-offsetof -Wcast-align -fno-exceptions -fno-strict-aliasing
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
-std=gnu++0x -pthread -pipe -DNDEBUG -DTRIMMED -g -Os -freorder-blocks
-fomit-frame-pointer
/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/ipc/app/tmpgyKSzm.list
-lpthread -Wl,-z,noexecstack -Wl,--build-id
-Wl,-rpath-link,/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/dist/bin
-Wl,-rpath-link,/opt/kaarpux/firefox-21.0/lib -L../../dist/bin
-L../../dist/lib -ldl
-L/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/dist/bin
-lxpcom -lmozalloc -lxul -L//lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl
-Wl,--whole-archive ../../dist/lib/libmozglue.a
../../dist/lib/libmemory.a -Wl,--no-whole-archive -rdynamic -ldl
/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/ipc/app/tmpgyKSzm.list:
INPUT("MozillaRuntimeMain.o")
/bin/ld: warning: libhunspell-1.3.so.0, needed by
../../dist/bin/libxul.so, not found (try using -rpath or -rpath-link)
../../dist/bin/libxul.so: undefined reference to `Hunspell::spell(char
const*, int*, char**)'
../../dist/bin/libxul.so: undefined reference to
`Hunspell::Hunspell(char const*, char const*, char const*)'
../../dist/bin/libxul.so: undefined reference to
`Hunspell::suggest(char***, char const*)'
../../dist/bin/libxul.so: undefined reference to
`Hunspell::get_dic_encoding()'
../../dist/bin/libxul.so: undefined reference to `Hunspell::~Hunspell()'
collect2: error: ld returned 1 exit status
---------- [END] ----------
But libhunspell-1.3.so.0 IS indeed there.
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.
For thunderbird, the experience is simlar.
For ghc-binary I get
---------- [BEGIN] ----------
configure GHC_BINARY
checking for path to top of build tree...
utils/ghc-pwd/dist-install/build/tmp/ghc-pwd: error while loading shared
libraries: libgmp.so.3: cannot open shared object file: No such file or
directory
configure: error: cannot determine current directory
Warning: child process exited with status 1
---------- [END] ----------
I was thinking this might have something to do with symbolic links, as
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/packages/g/ghc-binary.yaml
creates two symlinks before configure.
However, again:
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.
(and there must be many packages with double symlinks anyway...)
For ghc I get
---------- [BEGIN] ----------
configure: WARNING: unrecognized options: --disable-dependency-tracking
checking for gfind... no
checking for find... /bin/find
checking for sort... /bin/sort
checking version of ghc... unknown
configure: error: Cannot determine the version of
/home/kaarpux/kaarpux/linux/build/opt/ghc-binary-7.4.1/bin/ghc. Is it
really GHC?
Warning: child process exited with status 1
---------- [END] ----------
And again:
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.
So, now I am stuck.
I could understand that the output of my probe would not be as I expeced.
Fine.
But how could a simple probe like this make building a package fail ???
Any input, help or comments would be most appreciated.
/Henrik