Performance issue with systemd-coredump and container process linking 2000 shared libraries.

Romain GEISSLER romain.geissler@amadeus.com
Tue Jun 20 22:05:29 GMT 2023


> Le 20 juin 2023 à 23:37, Mark Wielaard <mark@klomp.org> a écrit :
> 
> Hi,
> 
> On Mon, Jun 19, 2023 at 05:08:50PM +0200, Mark Wielaard wrote:
> 
> So I made a mistake here. Since I was testing on fedora 38 which has
> DEBUGINFOD_URLS set. Without DEBUGINFOD_URLS set there is no big
> slowdown.
> 
> Do you have the DEBUGINFOD_URLS environment variable set?
> 
> The real sd-coredump will not have DEBUGINFOD_URLS set (I hope).
> 
> Thanks,
> 
> Mark

Hi,

Our real use case happens on a Openshift 4.13 node, so the OS is Red Hat Core OS 9 (which I assume shares a lot of foundations with RHEL 9).

On our side Francois also told me this afternoon that he didn’t really reproduce the same thing with my reproducer posted here and the real systemd-coredump issue he witnessed live, and also noticed that with DEBUGINFOD_URLS unset/set to the empty string my reproducer has no problem anymore. What he witnessed on the real case (using perf/gdb) was that apparently lots of time was spend in elf_getdata_rawchunk and often in this kind of stack:

Samples: 65K of event 'cpu-clock:pppH', Event count (approx.): 16468500000                                                                                                                                 
Overhead  Command         Shared Object             Symbol                                                                                                                                                 
  98.24%  (sd-parse-elf)  libelf-0.188.so           [.] elf_getdata_rawchunk
   0.48%  (sd-parse-elf)  libelf-0.188.so           [.] 0x00000000000048a3
   0.27%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getphdr
   0.11%  (sd-parse-elf)  libc.so.6                 [.] _int_malloc
   0.10%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getnote
   0.06%  (sd-parse-elf)  libc.so.6                 [.] __libc_calloc
   0.05%  (sd-parse-elf)  [kernel.kallsyms]         [k] __softirqentry_text_start
   0.05%  (sd-parse-elf)  libc.so.6                 [.] _int_free


(gdb) bt
#0  0x00007f0ba8a88194 in elf_getdata_rawchunk () from target:/lib64/libelf.so.1
#1  0x00007f0ba98e5013 in module_callback.lto_priv () from target:/usr/lib64/systemd/libsystemd-shared-252.so
#2  0x00007f0ba8ae7291 in dwfl_getmodules () from target:/lib64/libdw.so.1
#3  0x00007f0ba98e6dc0 in parse_elf_object () from target:/usr/lib64/systemd/libsystemd-shared-252.so
#4  0x0000562c474f2d5e in submit_coredump ()
#5  0x0000562c474f57d1 in process_socket.constprop ()
#6  0x0000562c474efbf8 in main ()

My reproducer actually doesn’t fully re-implement what systemd implements (the parsing of the package metadata is clearly omitted), so I thought I had reproduced the same problem while apparently I didn’t, sorry for that. We will also have to double check if really just using 2000 dummy libraries is enough or if this also needs to have a more complex binary like we have in our real case.

Tomorrow on our side we will have to play a bit with a local build of systemd-coredump and try to run it manually to better understand what’s going wrong.


Note: when I wrote and tested my reproducer, I used a fedora:38 container, which doesn’t have DEBUGINFOD_URLS set (which may be different from a real fedora 38, non containerized)

[root@7563ccfb7a39 /]# printenv|grep DEBUGINFOD_URLS
[root@7563ccfb7a39 /]# find /etc/profile.d/|grep debug
[root@7563ccfb7a39 /]# cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Container Image)"

Cheers,
Romain



More information about the Elfutils-devel mailing list